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From the Linguistics Coordinator 



How can Notes on Linguistics best meet the needs' of the field linguist? That 
is the question we continually want to keep in mind as we assemble each 
issue. Different people will answer this in different ways, of course, and I 
am quite open to ideas about how to improve this publication. 

For the next few issues, and possibly beyond, I would like to try something a 
little different than usual, and that is to have ‘thematic issues’. These issues 
will have two or three articles which focus on one topic, one relevant and 
useful to field workers. In future issues, I expect to see articles dealing with 
dictionaries, with experimental linguistics (even without special 
equipment!), with the uses of historical and comparative linguistics, with 
computer tools, and so on. Suggestions for future themes are most welcome. 

In this issue, we focus on the topic of archiving language data. As notebooks 
get older, as papers in the boxes under your beds age and turn yellow, as 
tapes crack and degrade, as you’ve lost the copy of the computer program 
you used in the 1980s, the problem of preserving our language data becomes 
more and more acute. It is crucial to have strategies in place to preserve the 
language data we have worked so hard to collect. 

So here we present both an article and an interview with Joan Spanne, who is 
in charge of SIL s Language and Culture Archives. She discusses the 
urgency and the challenges of archiving data (it’s not the same as backing up 
your computer files) as well as some ‘how-to’ tips. We also include a report 
by Albert Bickford on an interesting conference on internet archiving, and a 
proposal to create a directory through which all language materials on the 
internet can be located. I hope you find these interesting and useful. 

One other item I would like to include more of is recognition of significant 
linguistic achievements by SIL members. We have traditionally included 
dissertation abstracts in NOLx, but some grammars and dictionaries involve 
just as much work, and are also worthy of recognition. If you know of such 
a work recently published by an SIL member, please let us know. 

Finally, I would like in this first issue of NOLx for which I am editor, to 
gratefully acknowledge the help of Eugene Loos and Betty Philpott. They 
do most of the work, and I am thankful to have them around. 

Michael Cahill 
International Linguistics Coordinator 
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The linguist’s role in archiving linguistic data 

resources 

Joan Spanne 

Director ofSIL Language and Culture Archives 

1. Introduction. What do I need to do to archive my linguistic 
materials?’ Several converging factors are leading more and more linguists 
to seek answers to this complex question. Some of these factors are: 

• The rate of language change (and language death), which heightens the 
urgency of preserving information about minority and endangered 
languages (a supply factor — we must not lose what evidence we have). 

• The awareness that the language community, as well as individual 
speakers of a language of study, have an interest in the disposition 
(preservation, access, use) of research materials about their linguistic 
and cultural heritage (an ethical factor). 

• The expanding availability of sophisticated analytical tools in the 
linguist’s arsenal, which enable the researcher to apply diverse methods 
for study and analysis to source texts and data sets (a demand factor). 

• The mounting evidence that data resources of the computer era are 
relatively fragile and short-lived — easily corrupted or made obsolete by 
advancing technologies which are not fully backward-compatible (a 
time factor: the critical time frame for instigating preservation strategies 
is much shorter than in the pre-computer era) (Rothemberg 1999; 
Bearman 1999). 

Certainly other pressures for archiving also exist, depending on the 
circumstances of the researcher. As a result of these pressures, the linguistic 
research community is beginning to work together to formulate standards 
and best practices for resource description, preservation, access, and tool 
development which will best serve the needs of all the interested parties: 
researchers, language communities, software developers, archival 
repositories (and their supporting institutions) and even businesses with 
interests in linguistic computing. The Open Language Archives Community 
is one significant forum in which this development is taking place. 1 OLAC 
and its participating archival repositories and research institutes are working 

' The Open Language Archives Community and the December 2000 workshop ‘Web-Based 
Language Documentation and Description’ which launched it are the subject of another paper in 
this journal issue and will not be described in depth here. 
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JOAN SPANNE: The linguist 's role in archiving linguistic data resources 5 

on answers to the complex questions involved in archiving all types of 
language research materials and enabling access and use through Internet- 
based repositories, taking advantage of the vastly increased capacity for 
sharing complex information resources which the growth of the Internet 
affords. This article is intended to aid linguists and anthropologists in 
preparing their primary source materials for deposit with an archival 
repository, so that these valuable resources can be preserved for the long 
term and made accessible to other users. 

2. What should be archived, and what formats are best? Archiving 
everything might be a goal in an ideal world, but in the real world of limited 
resources for storage and management, and limited time for all the work of 
describing, maintaining, finding and using materials, a certain amount of 
selectivity in archival work is necessary. The researcher preparing materials 
for deposit in some repository is the first-line selector, separating the wheat 
from the chaff among the materials he/she has collected and developed in the 
research process. Table 1 gives a list of the more prominent types of 
language resources desirable for archiving, though the list is not intended to 
be comprehensive. This article focuses on archiving primary source 
materials: recordings and transcriptions, lexical data, word lists, original 
texts, and field notes. 



Table 1: Resource types and formats 



Type of material 


Format sought for archiving: 


Language text, e.g., 
transcription of recording, 
original written text, 
translated text 


Paper print-out from a formatted document that 
includes fonts correctly rendered; Standard 
Format or tagged (XML) text 


Word list 


Paper print-out from a formatted document that 
includes fonts correctly rendered; Standard 
Format or tagged (XML) text 


Sound recording 


Magnetic tape recording (cassette or reel-to- 
reel); WAV file 


Lexical or anthropological 
data file 


SFM or XML tagged file with supporting 
settings files and descriptive documentation 


Descriptive or analytical 
document, e.g., working 
paper, article, report 


Paper print out from a formatted document that 
includes fonts correctly rendered; HTML, Rich 
Text Format, Portable Document Format 


Font and character set 
rendered by it 


Font files with complete printed description of 
code points and characters rendered 



ERIC 
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For all textual materials, the documentation of character encoding is critical; 
be sure that any specialized fonts needed to render the data are also included 
and documented. A paper print-out is a very important resource even for 
materials to be used in digital format in computer-based analysis, as it 
provides an accuracy check for rendering the computer data. 

The value of untranscribed recordings from unknown (undocumented) 
speech events or speakers is relatively low, as so many factors work against 
another researcher being able to make use of them. Questionable 
transcriptions (e.g. those made using inconsistent conventions or before 
reasonable familiarity with the language was developed) are also of 
relatively low value, especially if the source recording is lost. A significant 
factor in favor of preserving such materials would be the relative rarity of 
materials in the language— the fewer the number of resources in or about a 
particular language, the more valuable those few resources are. In such a 
case, additional work on such materials by someone quite knowledgeable in 
the language probably will be necessary. 

Descriptive and analytical works in long-outdated proprietary computer 
formats are usually best preserved in print format, as the accurate recovery or 
conversion and maintenance of such works in computer format is very 
difficult. For these works, the value lies in their intellectual content, rather 
than in the capability to use the file in some computer-based processing. The 
archival repository will be responsible for determining the most suitable way 
of making print materials available to its users (perhaps through 
photocopying or scanning). ° 

Tape recordings on magnetic media require special work to preserve them 
and careful procedures to reformat them digitally. This work should be 
carried out by the repository or a preservation specialist within strictly 
controlled parameters. For these, and any materials deposited with an 
archival repository, the depositor may request copies of reformatted works 
(e.g. a copy of a digitized sound recording made from an original magnetic 
recording) if he or she desires. 

3. Describing linguistic research materials. In order to be able to find a 
resource in an archival repository, to manage access to the resource, to 
preserve it through generations of technological change and to know how to 
use it once obtained, the resource needs to be described in precise ways. The 
essential work for the linguist is to understand and provide the information 
needed as much as is realistically possible. The rest of this article discusses 



2 Documenting the characteristics of digital audio files for archival purposes is a subject 
requiring more expertise than I can claim and more space than is available here. 
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the common elements used in describing resources, and suggests some 
relatively simple ways that the researcher can systematically organize this 
information. 

4. Metadata Categories. A description of an archived resource is also 
known as its METADATA and is composed of many discrete pieces of 
information, or ELEMENTS. A metadata SCHEMA is a definition of the 
specific elements used, their precise meanings (including the ranges of 
meanings they might have), rules of use and relationships among them. 3 The 
OLAC Metadata Set is a proposed standard for a basic level of resource 
description to be used by data providers (archival repositories) in the Open 
Language Archives Community (Simons and Bird 2001). It is still a draft 
under development and not yet a stable recommendation, but it is a very 
good starting point for this discussion. 

The OLAC Metadata Set (draft of 25 April 2001) contains 24 elements. Of 
the 24 in the OLAC set, five pertain specifically to software resource 
description and are not treated here. The prose descriptions here are 
intended to aid general understanding of their use and significance. 

• Contributor: the name of an individual or organization which has 

contributed to the resource but is not primarily responsible for its 
creation. Information about a person or service which has performed 
physical conservation work, reformatting, or other work to make the 
resource more useful or rescue it from digital obsolescence might also 
be noted here. If possible, further specify the role of this entity, e.g. 
‘sponsor’ or ‘service bureau for reformatting’. 

• Coverage: the spatial location or temporal period to which the resource 
pertains. If location can be reasonably predicted from the language 
identification, it is not necessary to specify it here, though noting the 
country provides a check on the correct assignment of language code if 
the name is ambiguous. 

• Creator: the name of an individual, collective group, or organization 
which is primarily responsible for creating or compiling the intellectual 
content of the resource. For a sound recording, this would typically 
name the language consultant(s) or performer(s) recorded; for a 
transcription, this would name the transcriber. If possible, further 



3 Those who wish to delve deeper into this general topic will find a good introduction to 
metadata systems in McKcmmish, Cunningham, and Parer 1999. 
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specify the role of this entity through a term such as ‘performer’, 
‘transcriber’, or ‘compiler’. 

Date: a date associated with an event in the history of the resource: date 
of creation, date of format conversion, etc. Where possible, specify 
year, month, and day, but if this is not possible, give an approximate 
date by year. More than one date (and event) might be needed in order 
to trace the development of the resource in a way that is useful to 
another researcher. 

Description: a prose description of the contents of the resource, such as 
an abstract, table of contents, or note about physical characteristics. 
Special circumstances surrounding the work can be given here, if the 
information does not fit in another element (such as format or type), or 
requires more explanation than can be accommodated in that element. 
For a collection of texts, this might give the number of texts or 
recordings, a generic physical description, and a list. 

Format: the physical medium or digital manifestation of the resource. 
This might be ‘paper manuscript’, ‘PDF file’, ‘SFM file’, or ‘cassette 
recording’ and can include the size or duration of the resource in pages, 
bytes, number of entries, hours/minutes/seconds, etc. 

Format.encoding: for a digital textual resource, it is critical to identify 
the character set used. A unique font and its encoding developed for a 
particular language will constitute a separate resource to be archived 
along with other resources. 

Format.markup: for a structured textual or multimedia resource such 
as an interlinearized text (perhaps done in Shoebox 4.0), documentation 
of the tagging used is essential for future users of the resource. 

Identifier: this is the means by which the repository will 

unambiguously identify this specific resource. It will be assigned by the 
repository, but the depositor may want to make a note of it for ease of 
future access. If you have developed your own identification system for 
your materials (and labeled them), the repository will also benefit from 
having this information, particularly if it helps to link related materials 
(recording with transcription, etc.) 

Language and Subject.Ianguage: These can perhaps best be 

distinguished as ‘commentary language’, the language in which analysis 
and/or description is given (or the language of the intended audience for 
the work) and ‘language under study’, a language that is the topic of 
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description or analysis. In order to avoid ambiguity, it is very helpful 
for the repository to have these (particularly Subject. language) specified 
by its Ethnologue code, or some other standard coded identifier for 
languages, such as ISO 639 (ISO 1998). ^ 

Though not yet incorporated into the OLAC Metadata set (or into the 
structure of its controlled vocabulary for language identification, which 
is the Ethnologue), it is also useful for the repository to have 
information regarding the linguistic family and stock to which this 
language belongs. 

Publisher: this pertains only to a resource that is already available to 
the public through some common distribution channel, such as a 
published dictionary. Publication (and distribution) of a work is one of 
the rights reserved to a work’s creator, but sometimes transferred in 
whole or in part in the publication contract; information about the 
(previous) publication of a work is not only helpful to users wishing to 
obtain the resource, but also important to the repository in managing the 
resource and making it available in accordance with any intellectual 
property claims. 

Relation and Source: these two can be a bit difficult to sort out. The 
intent is to capture information about another resource that is related but 
independent (use of this resource is not strictly dependent on having the 
related resource). Use SOURCE where the intellectual content is derived 
from another resource but is now of a different TYPE and additional 
creative work was involved in making the new resource. For example, 
the SOURCE metadata for a transcription would list the sound recording 
on which the transcription is based. Use RELATION where giving 
information about a previous or succeeding version or different FORMAT 
of the content, such as a PDF file of a printed document. 

Rights: this is a statement regarding who has what rights or 

permissions to access, use, distribute, or make other works derived from 
this resource. Whatever is known about specific agreements made with 
the originators of the intellectual content of the resource (a language 
consultant, another researcher, a language community) should be made 
clear to the receiving repository. 

Subject: This contains keywords that describe the topical content of the 
resource. This might refer to a linguistic theory on which an analysis is 
based or a sociolinguistic or anthropological concept dealt with in the 
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work. This can also be used for a subject term or phrase for the topic of 
a discourse, such as ‘canoe building’ or ‘healing ceremony’. 

• Title. The formal name by which the resource is known and would be 
cited. For a title given in the language under study (Subject. language), 
it is also helpful to supply a gloss of the title. 

• Type. This identifies the broad category to which this resource belongs, 
such as. Collection, Dataset, Graphic Image (such as for a photograph or 
JPEG file), Software, Signal (Sound or Video), or Text. For 
Collection it is helpful to specify the type(s) of the items contained in 
the collection. A scanned image of a page of text is identified as Type: 
Text, since it is intended to be interpreted by the user as text rather than 
as a representational picture. In most cases Type can be deduced from 
the Format information, but in some cases that is still ambiguous, as in 
the example of a scanned page of text which can be of the same Format 
as a scanned photograph. 

• Type.data: This element identifies the nature or genre of the resource 
in specifically linguistic or ethnographic terms . 4 Broad categories 
considered here are ‘Transcription’, ‘Annotation’, and ‘Description’. 
Subcategories can be given to identify the type of transcription and/or 
annotation, e.g. phonetic or practical orthography, or the genre of a 
written text of recorded speech event, e.g. personal narrative, 
conversation, sermon. 

It may be difficult to decide whether a particular bit of information belongs 
in one element or another. Provide whatever information you can for each of 
these, associate it as best you can with the most logical element and try to be 
consistent in your choice of element across your descriptions of different 
resources. In case of confusion, try to give more explanation (rather than 
assuming one meaning over another), and let the repository worry about 
fitting the bits into the proper boxes in the best way. 

5. Seems like a lot of work... Documenting language resources may seem 
like it involves a lot of tedious work. (Congratulations on reading this far in 
the article!) That perception is accurate, but there are some ways to ease the 
burden. The best way to simplify the work of documentation is to do it as 



This category is heavily influenced by the interest and focus of the researcher and of the 
repository. A good example of this is The Archive of the Indigenous Languages of Latin 
America. A ILL A is developing a metadata set specifically designed for their collection of 
recordings, transcriptions, and translations — mostly of naturally occurring discourse. Their 
focus is primarily on ethnography of discourse, and so the metadata which their system uses 
reflects and supports this focus. (Michael, 2000) 
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Title 


Collection of Language-X sermon recordings 


Creator 


Name(s) of speakers) (with their permission); if the speaker 
does not wish to be identified, it is helpful to have something 
like a Language-X church elder’ 


Contributor 


Name(s) of individual(s) or group(s) that recorded the speech 
events (probably yourself among them) 


Date 


Time period over which the recordings were made 


Language 


Name and Ethnologue code of Language-X 


Subject. language 


Leave blank, since the Language element is sufficient 


Description 


53 recordings: 

1. Title. Title gloss. Date. Speaker. Length 

2. Title. Title gloss. Date. Speaker. Length 


Coverage 


Country where Language-X is spoken 


Format 


Cassette 
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Format. encoding 


Not relevant for non-digital resources 


Format. markup 


Not relevant for non-digital resources 


Type 


Collection: Sound recording ' 


Type. data 


Sermon; Religious oratory; Hortatory speech as appropriate 


Subject 




Source 




Relation 




Publisher 




Rights 


The church elder who originally delivered these sermons 
wishes them to be available to members of his language 
community and to scholars, but requests that permission from ' 
the Y Church leadership be obtained before publicly airing 
any sermon or publishing any transcripts. 



Title 


Transcriptions of Language-X sermon recordings 


Creator 


Your name or other primary transcriber(s) 


Contributor 




Date 


Dates of transcription 


Language 


Name and Ethnologue code of Language-X 


Subject. language 




Description 


53 transcriptions of recordings: 

1 . Title. Title gloss. Date. Speaker. No. of pages. Reference 
to match with specific recording 

2. Title. Title gloss. Date. Speaker. No. of pages. Reference 
to match with specific recording 


Coverage 


Country where Language-X is spoken 


Format 


Standard Format 


Format. encoding 


Identifier for the character set used 


Format. markup 


Description of sf markers used, or reference to where they are 
documented. 


Type 


Collection: Text 



0 
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Type. data 


Transcription/practical orthography 


Subject 




Source 


Collection of Language-X sermomrecordings submitted with 
transcriptions 


Relation 




Publisher 




Rights 


The church elder who originally delivered these sermons 
wishes them to be available to members of his language 
community and to scholars, but requests that permission from 
the Y Church leadership be obtained before publicly airing 
any sermon or publishing any transcripts. 



Title 


Free translations of Language-X sermon recordings 


Creator 


Your name or other primary translator 


Contributor 




Date 


Dates of translation 


Language 


Language of translations 


Subject. language 


Name and Ethnologue code of Language-X 


Description 


48 translations of recordings/transcripts: 

1 . Translated Title. Original title. Date. Speaker. No. of 
pages. Reference to match with recording 

2. Translated Title. Original title. Date. Speaker. No. of 
pages. Reference to match with recording 


Coverage 


Country where Language-X is spoken 


Format 


Rich Text Format 


Format.encoding 


ASCII 


Format. markup 




Type 


Collection: Text 
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Type. data 


Sentence level free translation 


Subject 


y 


Source 


Collection of Language-X sermon recordings submitted with 
translations (48 of 53 recordings were translated) 


Relation 




Publisher 




Rights 


The church elder who originally delivered these sermons 
wishes them to be available to members of his language 
community and to scholars, but requests that permission from 
the Y Church leadership be obtained before publicly airing 
any sermon or publishing any transcripts. 



It is tempting for the researcher to conclude that much of the descriptive 
information could be figured out’ by the archival repository once the 
materials are in their custody. This may be true of some elements of the 
description, but most of the metadata really cannot be deduced easily (or 
even at all) from the resources themselves, without risk of error and a lot of 
time spent eliminating other possibilities. 

6. Archiving— a partnership: Archiving is essentially a partnership: the 
depositor and the repository working together to organize, describe, and 
preserve valuable resources in anticipation that someone will want to use 
them again in the future. In linguistic and anthropological research these 
resources are the products of years of labor and a significant relationship 
between a particular community and the researcher. Thus, a very important 
third — though sometimes silenced — member of the partnership is the 
language community and the individuals who have worked with the 
researcher. Often the repository must rely on the researcher to obtain and 
convey a record of their interests and desires relating to these products of 
both individual creativity and collective linguistic and cultural heritage. In 
its turn, the repository does its best to represent and enforce these interests 
and desires in the terms of use imposed on future users of the resources. 
Potential future users — members of the language communities themselves, 
other scholars, educators, government officials — will be looking for 
materials like these. The archival repository will have plenty of work to do 
to ensure that these resources remain viable and accessible. Getting off to 
the right start on this task will depend to a large extent on the descriptive 
information supplied by the people who have created them. 
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The SIL Language and Culture Archive: 
An Interview with Joan Spanne 

April 6, 2001 



Why archive? 

E: Joan, you have been entrusted with heading up SIL’s Language and 
Culture Archive. 1 Why should we have such an archive? 

J: There are two very basic motivations within SIL: one of them is to keep 
the product of our work safe, both what is complete and what is not yet 
complete, for our own purposes, that is, for the continuation of the work. 
The other is that we want to make material available for those outside of SIL, 
to satisfy the academic service part of our mission. An archival repository 
needs to have a specific mission statement, one that supports the mission of 
the organization. A lot of knowledge about the repository’s context goes into 
decisions on what resources to collect, how to keep them and who may use 
them. I discuss that in an Archiving Guidelines 
document recently sent out to SIL entities. 

E. You mentioned ‘Guidelines’. Are the Guidelines 
readily available to all the members of SIL? 

J: They are available to anyone within SIL and to those 
who are closely associated as partners, to read and 
comment on. They are not specifically for the OWL, 
the ordinary working linguist, who might be 
overwhelmed by them. But there are those owls who are interested in it and 
who could make very helpful comments, and so the Guidelines are not closed 
or confidential in any way; probably the easiest way to get a copy is to 
contact me. 

E: So they can request a copy by emailing you atjoan_spanne@sil.org? 

J: I can send a copy as an attachment, in a Word document, or for those who 
are working on Macintoshes, an HTML file. 




What are Guidelines? 



1 



The interviewer (E) is Eugene Loos. 
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Can a linguist get credit for ‘archiving’? 

E: Taking ‘archiving’ to mean the two ways that you 
have mentioned in the Guidelines, as ‘storage for 
retrieval just in case something happens’ versus 
‘storage for retrieval by those who want it’, I would 
guess that most people probably would be glad to have 
the security side attended to by someone else. The part 
about ‘making information available’ really takes the 
burden off of them to ‘get quotable’ — make their work 
more broadly known. How can we go about seeing 
that an author really gets credit for what he/she has done? Someone might 
plagiarize what we have taken years to put together! 

J: Until 1978, protection under the law for the original linguistic or artistic 
expression — meaning the copyright — was tied to publication, or at least to 
the registration of a work with the proper national authority. From 1978 
onwards, the protection of copyright became inherent in the act of creation, 
so that unpublished or unregistered works are also protected. In other words, 
it is not only unethical, it is illegal for a researcher to publish as his own the 
work of another researcher. The Archives can help establish original 
authorship through the records it keeps of materials deposited and the 
measures it takes to protect those materials from unauthorized copying or 
theft. 

E: So if someone uses something available in an archive, anyone can double- 
check what the user is doing to see whether he has given proper recognition 
for what he has used? 

J: Yes. 

E: That’s a wonderful comfort, I think; an assurance for the field worker. I 
would think that teams who have been on the field 5, 10, 15, or more years 
might now have an accumulation that they would find it a fearsome task to 
catalogue. 

Who is going to do the depositing? 

E: Just think of all the word lists and surveys and so 
on. Isn’t that just too much to ask of anyone? 

J: There are two answers to that question and 
they’re tied together. It is a lot of work. It must be 




It took me 45 years! 




My CV! 
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recognized by the branch that preparing materials to be archived is important 
work, it’s not ‘if you have time’. The entity that is responsible for overseeing 
the work should make archiving an expected part of a field linguist’s 
assignment. ^ 



Isn’t description an onerous task? 



E: OK, well, the Peru Branch has had an open archives policy so they 
microfiched everything to make it all available, but nothing was tagged. 
Perhaps the name of the text or something is there, but how would we go 
back and log in all that microfiched stuff — some of which is scarcely legible? 

J: We have some of those materials in the L&C Archive because when the 
microfiche was made, copies of the fiche were sent to Dallas. And you’re 
right; some of those are not very legible. But there is internal information 
that a knowledgeable person can get out of the text that helps to identify it. 



E: That implies that someone would have to peek at it, to see what it is. 



J: Yes. In just about every case the researcher is 
going to need to have the material in hand to decide 
whether it fills his or her need. There is perhaps 
more information available concerning those 
microfiched materials than you are aware of. There 
was some basic indexing done here in Dallas, 
identifying as much as possible: the linguist involved, 
the language, the date, and the type of material that is 
course, but it tells more than that it is just a piece 
microfiche, but in most respects it is going to be 
someone wanting to use the resource to do a fair bit of 
materials to the place where they can be used. 




there. Very basic, of 
of paper or a set of 
the responsibility of 
preparation to get the 



Archive what? Old stuff? 

E: Well, let me ask about pages of data made long before computers were 
available. Maybe it is on old, yellowed paper, written in pencil; what can we 
do with that? 

J: There are three basic approaches for that older material: one is to save the 
piece of paper. I’m actually getting to be a proponent of archiving paper. A 
second alternative is to film materials, as has been done by some branches. 
Peru is not alone in that. The third possibility would be to scan material. 
Both filming and scanning require equipment and some degree of knowledge 
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in order to do a good job. We’ve kind of grown wide-eyed at the 
possibilities for distribution, making material available on the Internet, 
because scanned material is so easy to send around. The difficulty that I see 
with scanning is that, once you’ve performed the scanning work, you must 
continue to maintain the image files, along with a lot of structural 
information about how the files are related to each other. This is in addition 
to providing the information needed in order to find the resource in the first 
place, which is needed for a resource in any medium. There are many 
different bits of information, and keeping those bits together in an ordered 
manner is a lot of work, as you already know from your experience in 
scanning jobs that you have done. The same is true with microfiche, except 
that microfiche connects the material together in a physical medium that is 
much less subject to technological obsolescence. They are at least readable 
if you have a light source and a magnifying glass. The information that 
needs to be collected simply to preserve the work in a hardcopy form, 
whether still on paper or on a film, is not nearly as intensive as the work in 
uniting and preserving digital materials. 

E: What about early work? Everybody has first transcriptions, made perhaps 
before they settled into the phonemic patterns. What would you recommend 
concerning those early word lists and texts? 

J: That’s a tough question, an evaluative question that I as archivist would 
have a very hard time deciding on. That’s where I would find experienced 
linguists to ask for an opinion of the value of the material. I would say that if 
it is in a language in which there is a large body of materials that have been 
made available to publication or archival collection, then probably early 
work that might not be as reliable, either in transcription accuracy, or 
because of orthographic changes that have taken place, would not have a 
great value. We would have to look at the value of it in comparison to 
alternative sources that there are for data analysis. 

E: I have had the interesting experience that sometimes in looking at tentative 
word lists of languages for which there is nothing much available, if you can 
get a comparative set of early transcriptions by different people attempting to 
record the same thing, sometimes you can discern phonetic features that 
otherwise might not have been captured. Those are cases where early, rough 
stuff is quite useful. 
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J: One thing that immediately jumped out to me in your statement was THERE 
IS NOT MUCH AVAILABLE, and so the fact that it is early work may make it less 
reliable in some situations, but as you point out, it depends on your purpose, 
your use for it. The fact that there isn’t a whole lot available increases its 
value. For a language that has very few or no remaining 
native speakers that was studied perhaps 20, 30, or 50 
years ago, and therefore we’ll have no one collecting 
material anymore in that language, what we have is what 
we have. Then, yes, those types of materials could be 
very valuable, but again, I would look to a 
knowledgeable linguist who knows what has been 
produced in that particular area. If he/she says, ‘Yes, 
this is rare stuff, we need to keep it. 

Archive in what form? 




J: Back to the question of medium, I’m more and more a proponent of 
keeping paper copies for some types of work. While working with Central 
America Branch’s material we had collections in paper copies and the digital 
files that were somehow related to the paper copies; either the hardcopy was 
typewritten copy that was later transcribed onto the computer, or in other 
instances we had the output produced ten years ago from computer files. We 
had to do a lot of going back and forth between the source and something 
produced from that source. Producing the files from some of those older 
computer files took a fair bit of effort, but it was not impossible. There are 
some problems with those computer files, characters, and fonts to be 
rendered; in some cases we could figure out what they were originally, to get 
a one-to-one correspondence, in other cases we’re not really sure. So a lot of 
that depends on the documentation that goes with those files. 

E: Like associated cc tables and fonts or printer drivers? 

J: Sometimes cc tables; in other cases a description of the process that was 
gone through in converting the materials because sometimes the files that we 
have now are actually conversions of much older materials, a very mixed 
bag. But with typescript originals we can pretty well make out what the 
author intended. When the interested party wants to get at the intellectual 
content of the materials, to read it and understand it, the paper is as good as 
or better than the computer files. When researchers want to put the material 
through some kind of further analytical process, usually done by computer 
these days, they want the computer source files. So they’re going to have to 
put the material that we have in print into a digitized form in order to apply 
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the desired analytical process to them. But a journal article, a conference 
paper, a report on findings, descriptions of analytical work done, are 
primarily documents for which the intellectual content is what the researcher 
would be interested in and so for those I think it is most appropriate to have 
them in hard copy. 

E: If you keep things in hard copy that comes from all the branches, you’ve 
got a storage problem! 

J: Yes, we do have a bit of a storage problem, but we are now talking about 
unpublished materials that are of such a nature that we don’t have the 
motivation to do the extra work of converting to computer files. Our 
Scripture materials are already archived in the printed form and also in 
computer form where such exists. We are committed to maintaining in the 
archives here any paper copy or microfilm that the branches have produced 
and we already have. And so when we eliminate that which has already been 
published, we’ve narrowed down the scope of the task for material on paper. 
There are types of material that we have strong motivation to keep in digital 
format, or to convert to digital format. Those would be language texts and 
data files, lexical files, files of field notes, recordings. How we deal with 
those is a whole separate question because we do have a high motivation to 
preserve them in digital format for future use. Our shoeboxes of 3X5 notes 
are far less usable, less accessible to the researcher than a lexical file in a 
computer database. 

E: I still want to pursue the idea of preserving hard copy. If you collect hard 
copy, how do you attend to the need of the researcher who must access it? 

J: In a lot of cases the researcher will be expected to come to Dallas. It is 
normal that a researcher must go to the place where he/she can obtain access 
and view the materials rather than, say, just requesting wholesale sets of 
materials. In other cases we will make photocopies available, because when 
we reach the stage of a researcher actually requesting specific materials, the 
motivation for doing something with those materials jumps up considerably. 
One of the reasons why I’m interested in preserving paper copy is that while 
there is definitely a space cost, there is a considerably lower labor cost, and 
labor is expensive, even in our organization. We are not used to counting the 
cost of the labor that processing requires. When we look at the cost and 
compare it to an unexpressed, unknown interest in the future, we have a hard 
time committing resources to the labor of putting it in a more usable format. 
When a request comes in for a body of materials for a language, then the cost 
benefit for dealing with that particular language goes up, and so in many 
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instances it can become a partnership between the archives and the researcher 
to make the material available in another form, and to work with that 
researcher who then makes an investment in the material and benefits from 
having access to it, which also returns some value for the benefit of future 
researchers. 

E: By making necessary copies, cataloguing, and so on? 

J: Yes. Even by hiring someone who actually keyboards material, then 
committing those computer materials to the archives, which can make them 
available. That way the archives actually participate in the production of the 
reformatted materials. The work of maintaining materials is greater than just 
receiving materials from an outside source. The archive needs to be in 
control of the format and the documentation from the structural and 
administrative perspective. 

E: I can envision that some researchers would be interested in having all the 
materials related to a particular language family. Those materials might not 
be currently accessible in any bundled set. But if he/she were to undertake 
the bundling of them and make that bundling be his/her contribution to the 
archive, it would be a very useful contribution — a way of utilizing efforts. 

What about updateable stuff? 

E: Would you archive something that possibly is going to be updated in a 
couple of years? 

J: A couple of years is on the margin of when I would say to archive it, 
because, the Lord willing, you’ll be able to complete the work. Yet, who 
really knows what is going to happen? It kind of depends on the project. If 
it will be five years before the next update, certainly, document it and put it 
in the archival collection. If it is something very much in process, or if you 
expect to get back to it in eight months, it’s really not in a form that you 
would want to make available to other people. In those cases I would say 
that is the kind of material that most assuredly you’d want to back up, not 
archive. 

E: If a person envisions that the material is rather static now but wishes to 
update later, will he have assurance that he can replace it with an updated 
version? 

J: Yes, we will need to be able to replace superseded material. In some 
instances the entities themselves might express the desire that the material 
not be updated, that subsequent versions just be added, to keep track of 
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versions made. That’s a special case. Usually it will be appropriate to 
remove the earlier materials because they’re not going to be of value for their 
linguistic content after the updated version is available. 

J. The tension between two things — work in progress versus work at a 
hiatus— is one of the reasons why a good backup process complements the 
archiving process, because you find a spectrum of materials. At one stage 
work clearly belongs in a backup, not intended for anyone else, and then it 
moves further along to the stage where it. is not quite ready for publication 
but is completed to an archiving level even if not ready for publication. It is 
a judgment call. 

How much time might be needed for archiving 

E: How much time to you think it might take for the field worker to archive 
an accumulation of data? 

J. That s a loaded question. A lot depends on the documentation that is 
already there, made right up front when the materials were created. If you 
have to rack your brain, asking ‘What is this?’ it will take more time than if 
you can find a clue and say ‘Ah! Look at this! I created a header that told me 
who this is, when I did it, the context of (if it was a sound recording and a 
transcription), who assisted me in the transcription,’ and ‘Oh, that helps me 
because that person did something in this way, whereas another of the people 
did transcriptions in another way.’ That’s one reason why it’s important to 
think through the whole process from the beginning, which is difficult to do. 
You may be fully occupied doing your particular task and don’t want to be 
bothered by what use someone is going to make of it later. 

Can we compress before archiving? 

E: Some things, sounds and images in particular, are very voluminous on 
computer. So how can we expect to accept such voluminous stuff? We 
would soon run out of space in which to store it! 
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J: Archives running out of digital space really isn’t that much of a problem. 
Bytes for storage are just not that expensive anymore. They can be 
expensive to transmit, and so we have a desire for smaller, kinder, more 
compact processes of information when we want to transmit it over the 
Internet. Think of both an archival information copy and a distribution copy. 
The distribution issue may be solved by using a compressed format such as 
PDF or DjVu™ or some other compression method. A couple of weeks ago 
colleagues asked about putting materials on a CD and using PDF because of 
a request that they’d received from a university. I said ‘Yes, by all means, do 
that.’ Especially if they’re going to support you in doing the job. He will be 
making a product that will be useful to himself and more 
accessible to a wider audience. It fulfills a need and a 
desire of an institution with which one wants to develop a 
relationship. But don’t consider that to be archiving. That 
is a product, something that you produce from materials 
that you have archived in, shall we say, more secure 
means, in terms of preservation strategies. 

Eight 

E: You mentioned PDF and DjVu . 2 Both are means of Gigabytes on 

bundling and compressing files. My experience with PDF one diskette! 

is that you can’t get out of a PDF file the same quality of 

picture as when you print out an original BMP or TIFF file, especially if the 

PDF file is calibrated for the computer screen rather than a high-resolution 

printer. 

J: That is true. And that is one of several reasons why PDF isn’t considered 
to be an archival format. It is a distribution format. 




2 

J: PDF is a file in Portable Document Format, produced through a software product called 
Acrobat, from Adobe. Acrobat is used on a wide variety of text and files to produce something 
that packages together paged information that has very much the look and feel of a book or 
document on paper. It can be read on a variety of platforms using a software reader, so a PDF 
document is acceptable to Macintosh, Windows based systems, or UNIX based systems. In 
order to be able to open the document and read its contents all you need is an Adobe Acrobat 
reader. DjVu is another technology that has a freely available reader for Windows and 
Macintosh. DjVu is based on a digital imaging format designed specifically for capturing 
scanned images and publishing them on the World Wide Web. AT&T Research Labs originally 
developed the technology, which is capable of very high compression ratios. DjVu is a 
registered trademark of LizardTcch, Inc. (http://www.lizardtcch.com/), which offers a suite of 
products for creating DjVu documents. A freely available reader can be downloaded at 
http://www.lizardtech.com/cgi-bin/products/desc.pl7tsb-25720. 
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E: That brings up the question of whether a field linguist should use 
something like ZIP to compress and bundle their files together first. 

J: The reliability of compression technology like ZIP is of sufficient level 
that I would accept into the archive materials that are zipped for transmission 
so as to make them as small as possible to get them to the archive. But for 
storing them in the archive, we won’t keep them in zipped format. We will 

unpack them and store in the native format, after checking to see that they are 
intact. J 

E: I suppose that most of us have made less-than-desirable recordings 
because a recording machine added noise or for other reasons the recording 
turned out to be less faithful than one might have desired. The recording can 
be cleaned up with something like COOLEDIT . 3 Would you recommend 
people clean up their files before submitting them? 

J: I would recommend that they send us the original. Or that they work with 
a reliable archive in doing any preservation work, including digitization of 
those files, partly because the more direct control that the archive actually 
has over conversion from a fragile medium such as tape into another format, 
the more they are going to know about the history of that material, and the 
better they are going to be able to preserve it and to document the action 
taken to preserve it. The documentation of those actions can be done, of 
course, by the depositors, and so I would say, ‘Participate with the archive.’ 

If it is something that you want to be involved in, great! The Language and 
Culture Archives would like to have the involvement of the individual 
researchers, the linguists, in the preservation of the material. I would like to 
be able to say that the Language and Culture Archives is in a position to act 
today on that preservation work. We’re not, but that is a significant project 
that I’m working on right now. 

Access to the archive 

E. That s encouraging. Who will have access to archived material? 

J. That depends on which specific material or specific collection is involved 
While a language program is in existence, and also while there is an entity 
supervising the work in the area, country, or region where that is going on, 
then it is the entity or supervisory staff that establishes any controls over who 
has access to the materials. Even if deposited materials originate in 



CoolEdit is a product of Syntrillium Software Corporation. 
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Language Project A in Country C they are still controlled by the parameters 
that are set by that entity or staff; there may be issues of sensitivity. If there 
is a question of a researcher intending publication within a reasonable time 
frame (say, 5 years), then those materials will be protected according to 
parameters set by the entity from which the material originated. 

Authorization to distribute 

E: I think that a significant use of the archive will be for comparative work in 
linguistics and anthropology, and so getting together all the available 
material, at least the unpublished materials, is a tremendous service. I have 
received a number of personal requests for materials with information on 
languages of the Panoan language family, the family to which Capanahua 
belongs. As years went by on the field, I latched on to Panoan word lists 
whenever possible. Later when I received requests for copies of those lists, I 
had to ask myself if those particular materials were authorized for 
distribution. So I had to refer back to the entities they originated in and get 
clearance to give them out. If a person gathers materials that are not 
necessarily his own, just gathering them because of interest, would we 
archive it? 

J: Well, that’s a very common situation, but when it comes to redistributing 
them, yes, you are right. There is a range of interests involved in those 
materials and that’s one of the gray areas to work out, in terms of 
documenting how soon they may be used, who has right over the material, 
and how those should be acknowledged — both in terms of the ethics of 
acknowledging another’s work, and also the legal aspects of the right of 
reproduction, distribution, and the right to create derivative works — all of 
that sort of thing. 

Published materials 

E: A field linguist’s material might include things already published. Will 
we archive published materials? And the source files of those publications? 

J: Materials published by SIL entities, including SIL International, have been 
archived for years, originally by the SIL Bibliographer, and now in the 
Language and Culture Archives. Academic Affairs is working to clarity the 
relationships, rights, and responsibilities of SIL workers, SIL entities, 
International Administration, and our partner organizations regarding 
academic publishing and republication of works. Works by SIL authors, 
which have been published outside SIL, are also archived by the L&C 
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Archives; we can only provide a copy of such a work in accordance with 
‘Fair Use’ provisions of copyright law. 

E: That should be assuring for depositors. Let’s say that a book of native 
texts of a particular language has been published, and somebody wants a 
digitized copy of that, but the book is still available. What will be our 
policy; will we say he may not have the digitized copy because the published 
copy or book is available? 

J: Access to digitized copy of materials published by SIL is a marketing issue 
that Academic Affairs will address. Copyright rules determine how much 
access there may be to materials published outside SIL. 

Alternative archives 

E: You know, there are other similar archives under the auspices of different 
institutions; would we say a person should archive in some other place as 
well as SIL, or only in one institution? 

J: That depends partly on the capacity of the institutions to provide an 
archival environment that is secure for the long term and dedicated to making 
materials available. If that archive is going to be able to preserve the 
material and make it available, it’s appropriate for an SIL entity to develop a 
relationship with that archive. Such is the case in Australia, where the AAIB 
has more than a ten-year history of depositing material with a particular 
repository. They have committed to depositing material there; consequently 
we don’t have much of those materials here. They have been maintaining 
materials in a branch collection but that branch collection isn’t necessarily 
going to be transferred wholesale to another institution because they’ve had 
this progressive program in Australia. In some other countries where we’ve 
worked we have deposited materials with various universities and agencies, 
which is something we ought to do. However, there is no assurance that the 
materials will continue to be available. So when the level of confidence for 
long term preservation is not good, the entity should make efforts to deposit 
their collection with an established repository, whether its the L&C Archive 
in Dallas or another archive, where there is that level of confidence. In a 
situation where multiple copies have been deposited in various repositories 
it should be dear among all of the repositories as to what can be done with 
those materials, what level of access there is, what rights over those materials 

exist, and who must be contacted to give whatever further permissions might 
be needed. 
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E: In the case of Australia, for example, will our archive have a record of 
where and what materials in Australia have been archived there, and be able 
to point researchers to that location? 



J: Yes, a great question. When materials are deposited with another archive 
and that is to be the archival collection for long term, then a record of those 
materials must also be given to us in Dallas, so that as we receive requests we 
can direct researchers appropriately. Making that inventory is a part of the 
process of formally depositing. In a lot of places the language projects work 
with some degree of autonomy, and it may be the personnel in the language 
project that develop the relationship with an archive. That is great, we need 
that to happen, but the agreement is formally between that SIL entity and that 
other organization. Part of that agreement is a thorough listing of materials 
that are deposited. I don’t mean in disgusting detail; I 
mean a listing of the collection contents that would be 
adequate to point researchers in the right direction, as 
well as be an adequate record placed in Dallas of what 
we actually deposited there. 

E: So our archives will not only ‘hold’ stuff but also it 
will have ‘pointers’ to stuff? 

J: Right. 



Are ya ready 1 
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REVIEW ARTICLE 

The Amazonian Languages. Edited by R.M.W. Dixon and Alexandra Y. 
Aikhenvald. 1 999. Cambridge & New York: Cambridge University Press. 

Pp. xxviii, 430. Cloth $69.95. 

Reviewed by TERRY MALONE 
SIL — Colombia 

0. Introduction. Until recently modem linguistic theory, as practiced in the 
U.S. since the appearance of Chomsky’s well-known 1957 and 1965 books, 
has taken little or no note of Amazonian languages. Two distinguished 
linguistic scholars have commented on this situation: ‘When we began 
working together ... in the late 1970s ... there were scattered individuals in a 
number of countries in South America, mostly members of Christian 
missions, who were studying individual Amazonian languages, but general 
linguistics was being practiced almost entirely without reference to even the 
existence of Amazonian languages’ (Derbyshire and Pullum 1998:3). This 
was in spite of sporadic attempts to challenge theory on the basis of data 
from Amazonian languages (for just a few examples see Pike and Kindberg 
1956, Pike 1964, and David Payne 1974). Most, but not all, of this work 
tended to appear in out of the way publications, theses, dissertations, and 
specialist journals ignored by the theoretical mainstream. 1 

To be sure, some work in generative phonology had already begun to take 
notice of languages in the Amazonian area. The work was mostly related to 
developments in metrical theory (several works of Bruce Hayes), or in 
specific languages (work such as Everett and Everett 1984a,b). Earlier work 



'The mainstream has equally overlooked the work of non-missionary linguists. One good 
example is Kaye (1970), who in spite of a short time in the field managed to produce a good 
partial description of Dcsano (Tucanoan) morphology. (A more complete Desano grammar 
written by a missionary linguist with 35 years of experience in the region is now in press.) 
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in autosegmental theory, chiefly with respect to nasal prosody within the 
more general context of harmony systems, had taken a passing glance at 
languages of the region (see, for instance, Safir 1982, van der Hulst and 
Smith 1982 and Hyman 1982). These developments inspired theoretical 
linguists to dust off works such as Bendor-Samuel (1960, 1966), and Kaye 
(1971), or scrounge around for work such as David Payne (1974) and Smith 
and Smith (1971),' but it was developments in typological theory, in part as a 
reaction to some of the failures of transformational-generative theory in the 
mid 1970’s, which awoke current linguistic consciousness. 

A crucial step in the awakening process transpired when ‘in 1976, a 
professor in London was expounding on why no object-initial languages 
existed in the world. A student in the class hesitantly raised his hand and 
said, “Excuse me, but I speak an object-initial language.” The professor was 
Geoffrey Pullum, and the student was SIL member Desmond Derbyshire’ 
(Cahill 1999:1). This incident was the beginning of a series of publications 
(Derbyshire 1979, 1981 ,1 985) which in interaction with the burgeoning field 
of typological studies has done more than any other previous development to 
bring the ignored and unknown languages of the Amazon languages to the 
lagging attention of theoretical linguists. 

A simultaneous, even more significant step was the development of a fruitful 
editorial collaboration between the professor and his former student, 
beginning with the work Derbyshire and Pullum (1981) and eventually 
resulting in Derbyshire and Pullum (1986, 1990, 1991, 1998). At roughly 
the same time the two got underway producing these volumes, developments 
recounted in the introduction of Doris Payne (1990) transpired: thanks to 
some professors and graduate students at the University of Oregon who had 
worked in the Amazon region the complex classifier systems in some of the 
languages of this region were brought to the attention of theoreticians, first 
in Doris Payne (1986) (in the now classic Craig 1986), and then in two 
articles in Doris Payne (1990). 

The next and still current surge of interest in Amazonian languages had its 
beginning with a renewed interest in the phenomenon of ergativity, most 
notably brought to the foreground of current theoretical consciousness in the 
papers of Comrie (1978), Dixon (1979) and the book by Plank (1979). 
Amazonian languages do not seem to enter into these earlier discussions; the 
year 1985 is the earliest date listed in Dixon (1994) for references discussing 
ergativity in Amazonian languages (in spite of earlier buried, provisional 
work such as Derbyshire 1983). Derbyshire and Pullum (1986), subsequent 



2 Early work such as Gomez (1980) has yet to come to the attention of theorists. 
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work by Derbyshire, and the involvement of the typological school at the 
University of Oregon (notably by the then Ph.D. student Spike Gildea) share 
major responsibility in bringing Amazon ergative systems to the attention of 
the wider linguistic world. 

Some of these efforts came to the attention of the senior editor of the book 
under review, who was in part responsible for the simultaneous upsurge of 
interest in ergativity. He had, in his own words: 

. . . devoted several decades to searching for substantive linguistic universals. In 
case after case, just as he thought he had achieved some significant typological 
statement, a counter-example popped up; and this was invariably from a 
language of Amazonia. He decided that the most sensible course of action was 
to learn Spanish and Portuguese and then go to South America ... In this way he 
achieved a degree of insight into the most complex linguistic area in the world 
today’ (3). 

He also ran into the coeditor, who had in the late 1980’s begun researching 
obscure Brazilian Amazonian languages. One happy result is the book under 
review; another is the emergence of a second professional team who, like 
Derbyshire and Pullum, enjoy similar prestige in linguistic circles, and who 
can hopefully help to push forward what the productive Derbyshire and 
Pullum team has begun. ‘The Amazonian Languages’ is Dixon and 
Aikhenvald’s major step in that direction. 

A casual reader who is not a specialist in the study of South American 
languages and who is not aware of these theoretical developments might be 
tempted to pass by this book. That would be a crying shame: in spite of the 
title, the book is relevant to linguistic investigators in all fields, because it is 
hard to stumble across an Amazonian language that does not have something 
to offer to theoretical issues currently in vogue. Not only does the book 
provide an overview of the major language families of the Amazon Basin, 
but it also describes minor language families and isolates. In addition, it 
gives a good basic (if somewhat brief) introduction to the area and study of 
Amazonian languages. Two of the chapters describe synchronic language 
contact situations. Almost all of the chapters provide historical background 
for the study of each family or group of isolates, and all of them reveal areas 
relevant to current issues in phonology, morphology, syntax, semantics, 
comparative and diachronic linguistics, and grammaticalization. 

In this review article I briefly recapitulate the contents of each chapter, at the 
same time emphasizing characteristics unique to or unusual in the languages 
being described, and relevance to current issues in the field. After the 
content summary I comment on some more general issues which the book 
raises, and then evaluate the book as a whole. 
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1. Contents. The editors first discuss conventions used in the book, 
including spelling, naming of language families, definition of ‘language’, 
grammatical terminology, following what Dixon calls ‘basic linguistic 
theory’, i.e. ‘the accumulated tradition of linguistic description that has 
evolved over the last 2,000 years’ (xxvi). This is a welcome section, too 
often missing in linguistic books. 

In the first chapter they provide a good basic introduction to the book, 
discussing topics such as the purpose of the book, the situation of 
scholarship in the region, ‘cultural background’ of the region, ‘linguistic 
diffusion’ (areal linguistic features), proposed and likely genetic 
relationships, ‘the punctuated equilibrium model’ (see Dixon 1997), and the 
organization of the book. 3 The section on current linguistic scholarship and 
‘cultural background’ are quite adequate for a linguistic anthology (that is 
what this book is), but anyone who desires or needs a more in-depth 
treatment of the cultural/historical situation of Amazonian languages should 
proceed to the excellent in-depth introductions in Derbyshire and Pullum 
(1986, 1990, 1991, 1998), where such information is more appropriate. 

Although the author of each chapter has been given considerable freedom to 
describe language families according to the character of the respective 
linguistic systems, each chapter follows a similar format. Most begin with a 
brief summary of the history of studies in the family under consideration 
(including discussions of available classifications and comparative 
reconstructions), all provide maps of the location of language groups with 
accompanying population statistics, and all summarize the phonological 
traits typical across the family’s languages. Each author then launches into a 
description of the language family’s grammar. It is here that approaches 
differ significantly; nevertheless, morphology figures strongly in the 
majority of families, though it does not always appear under that title. 
Syntax receives attention where it is appropriate, and some authors even 
manage to squeeze in limited information on discourse grammar (text 
linguistics). Unfortunately, the authors had to observe rigid constraints on 
length for reasons outside of their and the editors’ control; in spite of the 
incomplete descriptive coverage of Amazonian languages at present, even 
now a full book could be written on any of the seven families represented in 
the first eight chapters. 

The first four chapters after the introduction cover the three largest language 
families in Amazonia, both in number of languages and of speakers: ‘Carib’, 



3 Readers who are familiar with the writings of both editors will detect the fingerprints of 
the senior editor all over this introduction. 
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by Desmond Derbyshire (23-64); ‘The Arawak language family’, by 
Alexandra Aikhenvald (65-106); ‘Tupi’, by Aryon Rodrigues (107-124); and 
‘Tupi-Guarani’, by Cheryl Jensen (125-164). Curiously, Arawak, the largest 
language family, comes second in the book, but otherwise the order reflects 
the relative importance of these families both in the region and in current 
studies. ^ 

Derbyshire’s chapter on the Carib family is what one would expect from a 
scholar of his calibre; his mastery of the details in this summary reflects over 
40 years of exposure to Amazonian languages. In his review of available 
classifications he notes that ‘Carib comparative and historical studies lag far 
behind those of the other two large Amazonian language groups’ (25). He 
does not dwell long on phonology: in Carib languages: the more interesting 
area is the interaction of phonological systems with the morphology, and it is 
here that phonologists will find plenty of fuel for theoretical fires, especially 
when considering the highly inflected verb systems. 4 

On the current linguistic scene Carib languages are most noted for their 
unique ergative systems. Because ‘ergativity ... to a greater or lesser degree 
governs the case marking, person marking, derivational processes and 
constituent order patterning’ (55), Derbyshire discusses it under several 
headings, including ‘inflectional morphology’ (31-37), ‘main clause 
structure’ (55), ‘subordinate clause structures’ (56-57), and a special section 
‘ergativity’ (60-61). There are five ‘dominantly ergative languages’ (61), 
and a variety of split systems; the now well-known OVS (actually OVA) 
order occurs in ergative languages of this family. As in the Mayan language 
family (also known for ergativity), there is a close relationship between 
possessive nominal prefixes and verbal person affixes. 

Derbyshire discusses at some length a current issue in the study of Cariban 
ergative systems: the two most distinguished scholars working in this 
language family differ on the origin and direction of diachronic change with 
respect to ergativity in the family. Derbyshire has argued elsewhere that 
earlier ergative systems are changing to accusative systems (Derbyshire 
1991, also in the bibliography of this article), whereas Gildea (1998) argues 
that accusative systems are changing to ergative systems. Derbyshire 
provides a fair treatment of both sides, noting that ‘Gildea’s research has 
been extensive and his diachronic approach is sound and persuasive’ (60), 
but at the same time he concludes that the issue cannot be satisfactorily 
resolved until ‘fuller descriptions become available of more Carib 



4 For a sampling, see Gildea (1995), also in Derbyshire’s bibliography. 
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languages’, which in turn will allow more extensive comparative studies and 
‘a more reliable internal classification of the Carib family’ (61). 5 

Cariban languages offer other interesting characteristics for typologists, 
phonologists, and Chomskyan theoreticians: locative postpositions inflected 
for liquids, flat surfaces, open areas, and enclosed places (42-43); a brief 
mention of ‘the particle word class’, which Derbyshire insists are not clitics 
but instead phonological words (53); and a rich array of nominalizing 
suffixes functioning syntactically as complements or adjuncts (43-52). 

In a little over a decade Aikhenvald has managed to conduct extensive field 
work in Arawakan languages (mostly in south-western Brazil), and has an 
impressive command of the available literature. She is the only author who 
singles out endangered (vs. nonendangered) languages in her list of 
languages for the family (67-71), though comments on endangerment and 
survival prospectives for the respective families can be found sprinkled 
throughout the book. In her discussion of available classifications she notes 
that ‘the first truly scientific reconstruction of proto-Arawak phonology ... 
was published by David L. Payne (1991). However, his subgrouping of 
Arawakan languages, which is based on lexical retentions, rather than on 
innovations, remains open to discussion’ (74). David Payne has done work 
on possible shared morphological innovations across South American 
languages (David Payne 1990), so that very likely he used retentions in an 
attempt to filter out the effects of pervasive areal diffusion. Aikhenvald’s 
evaluation here, as well as her own work on areal diffusion in the Amazon 
(one sample can be found later in this book), raises the more general 
theoretical issue of how one can dependably classify languages and trace 
genetic relationships in areas where extensive diffusion has taken place. 6 



5 Derbyshire says that ‘my view of the direction of change in the Carib family has been 
reinforced by a more general factor: the rampant ergativity that is found in so many Amazonian 
language families’ (61). Ergativity is actually more wide-spread in Central and South American 
languages: I mentioned the Mayan family above, but languages in the Chibchan family 
(geographically intermediate between Mayan and Amazonian territory in pre-Conquest times) 
also offer a variety of ergative features in a generally decreasing cline to the south (see Quesada 
1999 for a summary), and the Chocoan languages in western Colombia also sport ergative traits 
(Harms 1 994). As one moves down the Andes, beginning in southern Colombia one runs into 
the thoroughly accusative Quechua languages. In evaluating Derbyshire’s and Gildea’s 
positions, one should keep in mind that Cariban and Arawakan languages extended further to 
the north in pre-Conquest times. 

6 My unpublished classification of Tucanoan languages mentioned in Barnes’ chapter in 
this book also uses shared retentions instead of innovations. I have put the manuscript aside, 
chiefly because the present pattern of spreading for innovations in the family is clearly due to 
language contact, which in turn suggests that I have reconstructed a former language contact 
situation, instead of a network of genetic relationships. 
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Aikhenvald’s phonological summary (75-80) suggests that Arawakan 
languages have much to offer theoreticians in the area of phonology, 
including glottalization, aspiration and nasalization as word prosodies (79), 
and interactions of morphology with stress x systems (for instance, in 
Asheninka ‘monosyllabic verbal roots have an obligatory prefix or a suffix, 
to make them bimoraic , 80). In fact, some of the stress systems have much 
exercised phonologists: Hayes (1995:288-296) presents a detailed reanalysis 
of the analysis in J. Payne (1990), concluding that ‘this does not exhaust the 
stress phenomena of Asheninca’ (Hayes 1995:296), and mentions the 
language repeatedly throughout his book. Asheninca stress figures in several 
papers in the Rutgers Optimality Archive, and in other current literature; it 
has become a testing ground for current phonological theory. Other 
languages in this family appear to offer theoretical challenges in this area: 
for instance some, such as Achagua, assign stress according to grammatical 
word class. 7 

Arawakan languages also offer much of interest to typologists: inalienable 
and alienable possession (with ‘cross-referencing prefixes’, 82-83); unique, 
complex, multiple classifier systems in interaction with gender systems 
(some languages have three interacting classifier systems, and one, Palikur, 
has five) (83-84); and complex verb morphology, including cross- 
referencing prefixes and suffixes (two-thirds of the languages), split ergative 
systems (typical of most Arawakan languages with cross-referencing 
suffixes), abundant valency-changing derivations, complex tense-aspect, 
modality, directional, and aktionsart systems (82-94). 

Rodrigues is a fitting author for the more general description of the Tupi 
language family; he has over 40 years of experience working in the Amazon, 
and is currently the leading Tupian comparativist. Excluding the Tupi- 
Guarani branch Tupian languages have been poorly described, or not 
described at all; Rodrigues makes do with what is at hand and provides an 
overall introduction to the entire family. Outstanding characteristics of these 
languages include inalienable and alienable possession, strict transitivity- 
based verb classes, ‘rich systems of demonstratives’ (120), ‘subordinate 
clauses ... achieved through nominalization’ (121), and some ergative 
characteristics (chiefly ‘pivots’, 121 ). 8 



I base this statement on Achagua data that I have seen and analyzed. A phonology is in 
rough draft, coauthored by a Colombian linguist and an expatriot colleague. 

See Dixon 1 994 for discussion of ‘pivots’ with respect to ergative systems. 
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In her chapter Jensen, who has over 20 years of experience in the Amazon, 
and who is also a Tupian comparativist, presents a summary for Tupi- 
Guarani, the best known and best described branch of the Tupian family. 
Her historical-geographical introduction repays careful study; of all the 
families described in this book, European colonists learned to speak and used 
a few of the more prevalent languages exclusively from this family in 
preference to their native tongues (see 125-133). She devotes a considerable 
amount of space to Tupi-Guarani phonology (133-145), perhaps best known 
among phonologists for a variety of unique nasal prosody systems, some of 
them bidirectional (135). Other traits of interest to phonologists include 
vowel epenthesis or consonant loss across morpheme boundries (the latter 
interacts with metathesis) (136), and voicing of bilabial consonants at 
morpheme boundries (137). Word-final consonants can devoice, disappear, 
or become nasals (142-144). 

Traits of interest to typologists include split ergative systems interacting with 
the person hierarchy (O has precedence over A). The system includes four 
complex set of person markers interacting with transitivity and local 
discourse topic while marking A (subject of an active transitive verb), O 
(object of a transitive verb), S a (the subject of active intransitive verbs), and 
S 0 (the subject of stative intransitive verbs), ‘the genitive in nouns’, and/or 
‘the object of postpositions’ (146-148). 

The next three chapters cover four less extensive, smaller language families 
which nevertheless are significant in Amazonian linguistics: ‘Macro-Je’ by 
Aryon D. Rodrigues (165-206); ‘Tucano’ by Janet Barnes (207-226), and 
‘Pano’ by Eugene Loos (227-249). Rodrigues emphasizes comparative 
studies in his chapter on Macro-Je (165-166, 198-201), because the internal 
consistency and external relationships of this group have been much 
questioned by area specialists. Nevertheless, there are also goodies which 
will interest the non-specialist, such as nasal prosody systems (in most of 
the languages oral and nasal vowels contrast, 171-174). One language 
(Karaja) has men’s and women’s speech (176-178). Plural marking is 
parsimonious (non-existent in the Je family), sometimes appearing on 
pronouns but usually not on nouns; in some languages plural (A or O) is 
marked within the verb stem (183-184). Some ergativity occurs; Rodrigues’ 
summary suggests that more work is needed to determine its exact nature 
(193-195). 

Barnes’ chapter on Tucanoan languages reflects roughly 30 years of 
experience in the Vaupes of Colombia, and immediate access to dozens of 
linguists who have similar experience studying these languages. She 
discusses Tucanoan classification using an unpublished manuscript of mine, 
basically an update of an older classification (Waltz and Wheeler 1972, in 
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her bibliography). All Tucanoan classifications and reconstructions are 
clouded by hundreds of years of extensive internal diffusion between Eastern 
Tucanoan languages, raising the theoretical question of how one can classify 
languages within a family, and reconstruct a proto-language in such a 
situation. The sociolinguistic situation in the Vaupes is unique among all 
contact situations in the world; more details can be found in references in 
Barnes’ bibliography and in Aikhenvald’s chapter in this book on diffusion 
and language contact in the I9ana-Vaupes area (see below). 

Tucanoan languages have much to offer the typologist and phonologist: they 
sport the most complicated, highly inflected evidential systems in the world 
(at least among the world’s languages that have been studied); most have 
rich classifier systems (suffixes on numbers, demonstratives, nominalized 
verbs, and some nouns) in interaction with a gender system; almost all have 
nasal prosody; and across the family a variegated array of poorly described 
pitch-accent or accent systems begs for comparative study. Nasal prosody in 
Tucanoan is difficult to describe without postulating ‘three-way 
autosegments;’ 9 in spite of some interesting proposals within Optimality 
Theory, it is questionable whether this theoretical quandary has been 
resolved. Curiously, the Tucanoan languages are all thoroughly nominative- 
accusative in a region where ergativity is dominant. 

Loos’ chapter reflects over 40 years of experience with Panoan languages. 
Panoan phonology is complex (230-234); it is perhaps most distinguished by 
‘nasal spread’ and complex interactions of morphology with the metrical 
system, i.e. ‘an odd-even syllable-timing characteristic common in Pano 
languages causes phonological modifications such as segment deletion, 
plosive nasal release, stress assignment and possibly vowel harmony’ (232). 

For typologists there are ‘a variety of split [subject-marking] systems, the 
marking of the A and S being affected by focus’ (236) within a system of 
‘transitivity concord’ (some subordinate clauses, some adverbial verb 
suffixes, locative phrases locating A or S, and certain sequential clauses 
must be marked according to verb transitivity). There is a complex switch 
reference system, perhaps the most complex in the Amazon, and best 
exemplified by Sparing-Chavez 1 998 (described for Amahuaca, not 
available to Loos at the time of writing). Ergative marking occurs, uniquely 
marked by a syllable final nasal /n / which often disappears and leaves 
nasalization as its only clue; in turn, ‘in some of the languages the 



Barnes lists a paper in her bibliography describing how these segments operate (Bamcs 
1996), both for pitch and nasal prosody in Tuyuca, but she does not mention this fact in the 
chapter, perhaps due to limited space. 
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nasalization has been lost’ (240). Some languages have (apart from all the 
above) an impressive variety of verb suffixes; ‘in some languages more than 
130 verb suffixes are available’ (244). The chapter ends by briefly 
describing a complex system of deictics. ' 

Three chapters cover diminutive families: ‘Maku’ by Silvana and Valteir 
Martins (251-268); ‘Nambiquara’ by Ivan Lowe (269-292); and ‘Arawa’ by 
R.M.W. Dixon (293-306). Though the Maku family is now small, ‘four 
languages belonging to seven tribes’ (251); Maku speakers were the original 
inhabitants of the Brazilian Upper Rio Negro (and Colombian Vaupes) 
regions, later conquered and displaced by Arawakan and then Tucanoan 
groups. The phonology is quite distinct from surrounding languages, offering 
an abundance of CVC words (in a region where most languages are strongly 
CV oriented) and more complex vowel systems, except for Kakua, which 
seems to be heavily influenced by the surrounding Tucanoan languages. 10 
Phonologies are actually more complex and fascinating than this summary 
would suggest: Yuhup has nasal harmony, phonetic pre- and post-nasalized 
voiced stops (Brandao Lopes and Parker 1999). 11 Daw, Kakua, and Yuhup 
have tone, in all cases incompletely analyzed. 

Maku languages have much to offer the typologist: inalienable and alienable 
possession; locative postpositions which some might interpret as classifiers 
because ‘their choice depends on the physical properties of the referent of 
the head’ (258); noun incorporation (only in Nadeb); and ergativity (Nadeb, 
and possibly Kakua — the other languages are nominative-accusative). 
Within the Maku family, Nadeb stands out glaringly: this matter is discussed 
somewhat in Aikhenvald’s chapter on areal diffusion in the Iipana-Vaupes 
basin. 

The tiny Nambiquara family (in Brazil) also has nasal prosody (not 
mentioned as such by the author), three contrastive tones, and contrastive 
laryngealization on nasal and oral vowels. Resyllabification of underlying 
consonant-final verb and noun roots occurs when these are suffixed. In the 
area of typology these languages offer well-developed evidential systems, 
exceeded in complexity only by those found in Eastern Tucanoan languages. 
There is also a complex system of verb subordination suffixes, noun 
classifiers (occurring on nouns, as ‘deverbal nominalizers’, modifying 
adjectives in an NP, and numerals), and a set of subject, object and copula 



111 There is an factual error in the text here (256, second line). Tucanoan languages have six 
vowel systems; Kakua docs have a five vowel system, as the authors state. 

11 This reference was not available to the authors. 
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pronoun suffixes (including dual number). Lowe includes more on clause 
and interclausal syntax than do most authors in this book (277-279, 284- 
289); this has long been a special emphasis of his research. 

The most notable characteristic of wider interest in the small Arawa family is 
probably the complex split ergative system, in which discourse topic, person, 
and noun suffixes all interact. The resulting construction then determines 
(for the most part) constituent order (299-300, 304-306). The languages 
have gender, and feminine is unmarked (masculine is considered to be 
marked). Some nouns ‘require a cross-referencing prefix /ka-/ on the verb 
and on some nominal modifiers (when the noun is in pivot function in the 
clause)’ (298). This is intriguing, in light of the /ka-/ of Arawak languages 
which has diffused into some Tucanoan languages; according to the analysis 
in Metzger 1999 one of the major functions of /ka-/ in Tucanoan and 
Arawakan is to indicate that a specific individual, item, or group is being 
focussed on. Some Arawakan languages are spoken in the general region 
where Arawa languages are found, though it does not look at first glance like 
much diffusion has taken place. 

Two chapters discuss small language families and language isolates: ‘Small 
language families and isolates in Peru’ by Mary Ruth Wise (307-340); and 
‘Other small families and isolates’ by Alexandra Y. Aikhenvald and R.M.W. 
Dixon (341-384). Both chapters represent a synthesis of the most disparate 
data to be treated in the book. Wise’s discussion of Peruvian languages 
shows an amazing command of detail, in addition to skill at organizing what 
looks at first glance like hopeless disarray; it reflects long experience in the 
region (roughly 35 years) and previous practice in synthesizing large 
amounts of data (see Wise 1990 for just one nice example). With respect to 
phonology Wise observes that ‘most of the languages in the five families 
differ from areal patterns in one or more traits’ (312), all carefully listed 
(312-318). The most interesting to phonologists likely will be nasal prosody 
(here she refers to the published version of David Payne 1974), tonal 
systems (tone is not that common in the Amazon, as this book attests), 12 
pitch accent systems in which stress (intensity) and high pitch (accent) do 
not necessarily coincide (described for Aguaruna in David Payne 1990a, but 



A fine resource now available for phonologists interested in tone systems of these 
languages (not available to Wise at the time of writing) is Walton et al. (1997); tone is marked 
on all entries in this dictionary. The third coauthor is a native speaker of Muinane, a member of 
the Witoto family. 
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according to Wise, probably typical of all Jivaro languages), and a nasal /h/ 
which nasalizes vowels that follow it (in Arabela). 13 

Most of the languages are rich in morphology. Characteristics of interest to 
typologists include classifiers, mostly numerical (one Witotan language, 
Bora, has over 350, and uses them with pronouns, in addition to the more 
usual nouns, demonstratives, adjectives and verbs). Dual number, unusual 
for the Amazon (as Wise notes) appear in the Witotan languages and Yagua; 
in Murui Witoto interaction with a three-way gender system results in a 
complex pronominal system. Some evidence suggests ergative traits in 
Jivaroan languages, but this possibility has not been fully explored. Verb 
morphology is especially complex, but the categories expressed are more or 
less typical for the Western Amazon. 

Aikhenvald and Dixon’s skills in dealing with complexity and disparity rival 
those of Wise; the wide distribution (Brazil, Bolivia, Colombia, and 
Venezuela) of small families and languages they summarize, plus a scarcity 
of sources, has made their job even more difficult. This variegated array of 
poorly described or basically undescribed languages certainly underscores 
the point made in the editors’ introduction concerning the lack of attention 
given to Amazonian languages by the current linguistic world. 

Languages covered which should be of interest to non-Amazonian linguists 
include the Yanomami dialect continuum (distinguished by ‘a rich system of 
verbal classifiers’ (347-348), multiple verbal proclitics and suffixes (over 20 
on each end), extensive noun incorporation (‘any noun in S or O function’ 
(350)), and extremely productive verb compounding. Mura-Piraha 
phonology is only briefly mentioned, but it is certainly one of the more 
unusual ones to be found in the book, with an unusually reduced consonant 
inventory (354) and a complex, as of yet only partially analyzed tonal system 
(according to Everett 1986). 14 The most well-known feature (among 

theoretical phonologists), that of syllable weight partially determined by 
consonant onsets, is not even mentioned here (see Everett and Everett 
1 984a, b), 15 Piraha is also somewhat unusual that it lacks formal marking for 
tense, plurals, and possession. 



13 Walker and Pullum (1999) discuss this segment in a short report in Language, in which 
they argue that phonetically possible (or pronounceable) segments should not be excluded from 
phonology on theoretical grounds (see 769). 

14 To my knowledge, nothing has appeared since which would indicate otherwise. 

15 These references arc not in the bibliography of this chapter. Readers who want to know 
more about Mura-Piraha phonology and who do not read Portuguese should look at 308-325 of 
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The authors discuss the Guahiboan languages (Colombia and Venezuela) 
and some language isolates found in Colombia and Venezuela. Guahiboan 
languages are unusual in a number of areas: ‘suppletive forms of verbs’ 
depending on the number of the A or 0 (372); "complex classifier systems 
(used with numerals, adjectives, and deictics) (373); ‘an unusually large 
number of oblique cases compared to other Amazonian languages’ (375); 
‘some traces of split ergativity, of an active-stative type;’ incorporation of 
inalienably possessed nouns, either in S function with verbs of physical state, 
or S, 0, or oblique function with other verbs. 16 

The last two chapters discuss two linguistic areas characterized by linguistic 
and cultural diffusion across several language families: ‘Areal diffusion and 
language contact in the I9ana-Vaupes basin, north-west Amazonia’ by 
Alexandra Y. Aikhenvald (385-416); and ‘The Upper Xingu as an incipient 
linguistic area’ by Luci Seki (417-430). Both chapters will be of interest to 
sociolinguists and linguists specializing in language contact. Aikhenvald’s 
chapter contains an impressive amount of linguistic detail, much of it based 
on her own fieldwork in both Arawakan and Tucanoan languages. The 
discussion covers most areas of phonology, morphology, and syntax; the 
author even includes a brief comparison of ‘syntax and discourse techniques’ 
(405-406). In contrast, Seki offers very little linguistic data; her intent 
apparently is to alert potential researchers to the possibility of rigorously 
documenting diffusional history from its beginning in a region where such 
work could be a real boon to struggling comparativists. 

Diffusion in the l9ana-Vaupes basin is most distinguished by the almost 
exceptional unidirectionality: the diffusion is from Eastern Tucanoan 
languages to geographically contiguous Arawakan and Maku languages. 
The only currently known exception is the Arawakan prefix /ka-/ ‘relative, 
attributive’ (Aikhenvald mentions this prefix in a footnote on p.392 and in 
her chapter on Arawakan languages — see p.95). 17 The unidirectionality is 
most certainly a reflection of sociolinguistic factors, as the author notes. 



Everett (1986) (which is in the bibliography, and which does mention the role of consonant 
onsets in determining syllable weight). 

16 A couple of important references on the Guahibo language (Kondo 1985 and Queixalos 
1985) do not seem to have been available to the authors, probably due to inaccessibility. 

17 A detailed description of the occurrence and function of /ka-/ in Tucanoan and 
Arawakan languages is found in Metzger (1998) (not available to Aikhenvald at the time of 
writing). 
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2.0 Comments on more general issues. Several issues raised by the 
editors’ comments in the section on conventions and the book’s introduction 
need more comment for the benefit of those who have not worked 
extensively in the Amazonian basin. I comment on these issues in the order 
in which they appear in the book, and I include an issue not covered which 
should be brought to the attention of those who wish to pursue field work in 
the Amazon or use scholarship from the Amazonian area (perhaps as a result 
of reading this book). 

2.1 Terminology for language families. Specialists in South America when 
writing in English put an -an ending on the names of most language 
families. In Spanish, they do not, but instead write ‘the family X’, where X 
is the name without any adjectival (or other ending). In this book the editors 
follow the Spanish convention when writing in English. In many cases this 
means that a language family and a language in the same family will have 
the same name. They insist that context is enough to distinguish between the 
two, but my own experience suggests that this is not always the case. 

In fact, I had never given the whole business much thought, until opening 
this book. When operating in Spanish, I (like most colleagues) automatically 
use the Spanish convention, and when using English, I follow the English 
convention. One must not only be bilingual with respect to vocabulary and 
grammar, but also with respect to conventions of language use. In this 
review article I have followed standard English convention, except when I 
quote from the book. It is noteworthy that in previous work and in at least 
one unpublished manuscript I have seen postdating the book the junior editor 
uses the standard English convention. 

2.2 Communication between missionary and Latin American linguists. 

Although the editors’ observations are amazingly accurate regarding this 
topic, they are incomplete. After long years in Latin America, it appears to 
me that neither side has told the editors the whole story, perhaps because 
both sides know the editors communicate with anybody and everybody they 
can find working in the field, no matter what their religion, politics, or 
nationality (plus the junior editor, aside from being a polyglot, has an 
outstanding gift for relating to most anybody). For one thing, the Judeo- 
Christian ethics of most missionary linguists would preclude recounting to 
the editors negative incidents in a long history of ups and downs as they have 
sincerely tried to relate to local scholars and academic structures all this in 



18 In Dixon (1994) the senior editor uses the -an ending and ‘the X family. I have not had 
access to anything he has written postdating this book, and am wondering if there might also be 
a difference between American and Australian usage here. 
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the face of concerted regional negative publicity, which extended from the 
seventies until the early nineties decade, and still (as the editors note) lingers 
in the region. 

On the other hand, Latin American scholars are reluctant to comment on the 
highly politicized environment of too many universities, which has 
sometimes resulted in negative attitudes to the work of non-Latin linguists 
being a necessary part of maintaining one’s job and position. In Latin 
America it is much harder to find jobs and funding that allow one to describe 
these languages; that does not help matters. Fortunately, the funding 
situation is beginning to change, due in part to the founding of several 
organizations devoted to the preservation of endangered languages, to the 
creation of regional networks such as GT Linguas Indigenas (devoted to the 
study of Brazilian languages) and to activities of organizations such as 
SSILA (Society for the Study of Indigenous Languages in the Americas). 

Most important, in Latin American thinking linguistic work and associated 
literacy and development work is basically a political enterprise: it politically 
empowers minority peoples. The language family histories cited in the book 
(especially for the Tupi group and the Tupi-Guarani subgroup) should make 
the implications of granting political power to minorities in the Latin 
American context clear. North American and European linguists for the 
most part do not look on linguistic and literacy work as political 
empowerment, but instead as part of meeting and respecting basic human 
needs and rights (along with access to medical services, clean water, and 
adequate nutrition), or as a basically apolitical good work. This divergence 
in world view has certainly made communication and mutual understanding 
between the two parties more difficult. 

2.3. Definition of the Amazonian region. The editors state that ‘in this 
book we attempt to cover languages spoken in the Amazon and Orinoco 
Basins — that is, from the north coast of South America, east to the mouth of 
the Amazon, west to the Andes, and south to the southernmost headwaters of 
the Amazon tributaries. If most of the languages in a family are spoken in 
the Amazon/Orinoco Basin (e.g. Arawak) then we cover that family. If most 
of the languages in a family are outside the region (e.g. Guaicuru) then we 
do not deal with that family’ (4). This agrees with the definition proffered in 
the introduction to Derbyshire and Pullum (1986:1) and followed in Payne 
1990. However, this definition excludes at least one language family in the 
northwest comer of the continent which seems more Amazonian than 
Andean: the Chocoan family, extending all along the Pacific coast and up 
into the western Andean range from eastern Panama to the Colombian- 
Ecuadorian border. On the southern edge of Amazonia the Guaykuruan 
languages, according to Alejandra Vidal in her description of Pilaga 
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classifiers, present features typical of lowland Amazonian languages (Vidal 
1997:102). This will be evident to readers of this book who go on to read 
her article in the now defunct Journal of Amazonian Languages. 19 

These families on the two extreme ends of the Amazonian region raise an 
important question: when we talk about Amazonian languages, are we 
defining a linguistic area, or a geographic area? This issue is still to be 
resolved, and any possible answers are complicated by preliminary, as of yet 
unpursued hints of linguistic diffusion between Andean and Amazonian 
languages. 20 It should be noted that linguists are not the only ones who have 
to cope with this problem: in a recently published field guide to neotropical 
rainforest mammals the author includes Central American Pacific coastal 
species that occur below 1000 meters altitude, observing that ‘these species 
for the most part form a discrete fauna: mammal communities within the 
rainforest region are very similar to one another in their composition of 
monkeys, opossums, sloths, bats, deer, and rodents ... A few species known 
to occur only above 1,000 meters have also been included ... There are many 
borderline cases, and we have made some arbitrary decisions about which 
species to include’ (Emmons 1997:1-2). This author clearly defines a 
biological area within a more general regional area, and there is also some 
residue. 

2.4. Physical and logistical difficulties inherent in doing field work in the 
region. The editors do not comment on this matter, which is a shame, in 
light of the senior editor’s impassioned plea in Dixon (1997) for more 
scholars to engage themselves in the documentation of undescribed, 
endangered minority languages. Those who are not familiar with the 
Amazon or South American region, should not sign up quickly to join the 
descriptive enterprise — it is wise to first count the cost, and then plunge in. 



19 A handful of isolates which might be potential candidates in Ecuador, Peru (sec p.307) 
and Bolivia (sec footnote 3 on p.364) arc also excluded, but I have too little information to 
include them here. 

20 In a footnote Aikhenvald mentions the possibility of diffusion between Chocoan and 
Tucanoan languages with respect to nasal spreading. I was the source of this information; I 
have observed other shared characteristics (chiefly lexical vocabulary), but have not been able 
to study the similarities rigorously. In addition, she mentions the putative inclusion of the 
Chocoan family in the now generally discredited Macro-Chibchan subphylum in a footnote 
(370). The family has been considered an isolate, or alternatively stuck in the Carib family. A 
Chocoan linguist has commented to me that on the basis of his comparative studies he is 
tempted to locate the Chocoan languages in the Carib family. Until both the Carib and Chocoan 
families arc studied more extensively, and more materials arc available, it will be impossible to 
resolve these issues. 
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One pays in several areas: one is wear and tear induced by the physical 
difficulties inherent in working in the Amazon basin. Dan Everett provides a 
telling description in his review of Doris Payne’s 1990 book (Everett 1991). 
All who have done long-term, serious fieldwork' in the Amazon area have 
similar tales to tell; the details vary, but the overall theme is the same. More 
recently, in several countries, a more insidious danger has appeared: multiple 
armed illegal groups, mostly in marginal, isolated regions where a majority 
of poorly described minority languages are spoken. None of these armed 
groups care to see outsiders (no matter what nationality they may be) present 
on their turf, and usually employ brusque methods in dealing with this sort of 
perceived threat. 

Another area in which one must be willing to deal is that of providing to the 
speech communities with which one works the results of research, in a form 
which they can understand and use for their own benefit. More’ and more 
speakers of minority languages are tired of researchers bopping in for a few 
months, doing research which demands extensive community participation, 
and disappearing forever, without any tangible benefits accruing to the 
people themselves. The result is that academic specialists are more and 
more likely these days to run into speakers who demand more than salaries 
for remunerating the language data that they share. These demands tend to 
crop up in less isolated groups, whose speakers are more aware of global 
trends, but more isolated groups may make other demands-usually not in the 
researcher s field of expertise. It takes time and money to meet these 
demands, no matter what their nature. No ethnic community that I know of 
shares the Western academic passion for pure research, none cares what their 
language will add to the Western European cultural heritage of acquired 
knowledge (why should they?). In their view, what researchers (and other 
outsiders) do in their community should be good for something practical. 
This again is a matter of world view, and it is sure to catch the unprepared 
scholar by surprise. 

Another area in which one pays is that of professional advancement (see also 
the introduction to Payne 1990). It is difficult to conduct successful field 
work in South America without long term presence, or close association with 
scholars who have established the contacts and networks of relationships that 
come from long term residence in the area. Long term presence has its costs: 
field linguists resident in the Amazon area do not readily find the resources 
or time they need to keep up to date in linguistic theory, and the many 
unexpected demands associated with the descriptive task absorbs more time 
than any reader who has not worked in the area could ever dream possible. 
The result is frustrating: it is very hard to find the isolated blocks of time one 
needs to do heavy duty analysis and write theoretically oriented, data rich 
articles in one s native language — the kind of articles that major journals 
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might accept (forget purely descriptive articles— that hasn’t been a hot trend 
for thirty years). Basically, this is the major factor which delays for so long 
the efforts of field linguists working in this area to cast research results in a 
form readily accessible to the outside world. 

3. Conclusion. Non-specialists in South American languages perhaps have 
little idea of the courage and hard work that it took to produce this volume. 
Many of the reasons are touched on in their introduction. Nevertheless, the 
editors have succeeded in producing an accurate and ample overview of the 
current state of affairs in Amazonian linguistics, chiefly because they went to 
people who have been working for years in the area, and prevailed upon 
them to describe the languages and groups with which they are most 
familiar. One result is that six of the chapters are authored by SIL linguists 
and almost all of the other chapters lean heavily on material produced by or 
personal communication with SIL linguists. Four chapters are produced by 
Latin American linguists. Two (Ayron D. Rodrigues and Lucy Seki) have 
considerable experience in the area, whereas others are newer to the field 
(Silvana and Valteir Martins). This is a reflection of the incipient (and 
hopeful) trend of Latin American scholars to become involved in a field 
where there is much need for more hard descriptive and comparative work. 

In fact, I think it is in part because Dixon and Aikhenvald are relatively new 
in the area (compared to most of the authors in the book) that they even 
dared to put together a book like this, and I’m frankly thankful they’ve 
pulled it off. As a result, we now have a useful, comprehensive linguistic 
anthology for the Amazonian region, something that did not exist before they 
stepped in. Many of us who have worked longer in the area look more at 
what hasn’t been done, then end up stalled by everything we can’t do in the 
middle of an exceptionally difficult work environment. Some would say the 
enterprise is premature, but after reading the book I strongly disagree. My 
hope is that in company with Derbyshire and Pullum’s work, this volume 
will be a potent force in changing the current descriptive situation for the 
better. Perhaps it will even have some influence on current linguistic 
theorizing, mostly taking place in parts of the world which seem like another 
planet as I sit in South America writing this review. 
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The interface between syntax and discourse in Korafe, 
a Papuan language of Papua New Guinea. 

Published by Pacific Linguistics C-148. 

Degree granted by The Australian National University 1996. 

Cynthia J. M. Farr 
SIL— Papua New Guinea 

This dissertation focuses on the structure and function of three types of 
complex constructions which are central to Korafe discourse: (1) serial verb 
constructions (SVCs), (2) switch reference constructions (SRCs), and (3) co- 
ranking constructions. 

Each of the complex construction types has as obligatory constituents two or 
more clauses or verbs. SVCs and SRCs are ‘chaining’ constructions, which 
terminate with a verb, more finitely inflected than the preceding verbal 
constituents, i.e. verb stems in SVCs and medial verb forms in SRCs. 
Syntactic constraints marked on or implicit in chaining structures enable the 
speaker to monitor subject reference from verb to verb without using many 
overt noun phrases. This reduction in noun phrases brings with it a 
corresponding focus on the events represented by the verbs in chaining 
structures. The order of the verbs in these chains is non-reversible and 
mirrors the order of the events they represent in the real world. This makes 
them choice vehicles for conveying the foregrounded story line in narratives, 
legends, and procedures. Utilising verbs without their standard arguments to 
a) represent familiar events (e.g. ghambudo ‘dig’ for ‘dig a hole’, jedo 
govedo ‘chop plant’ for ‘making a garden’), and b) mark shifts in venue or 
temporal setting ( aira buvudo ‘he went and arrived’, ravara atetiri ‘they 
slept and it [day] dawned’) enables the speaker to concentrate on the 
specifics of the story in question, using noun phrases to highlight dominant 
and/or prominent participants and props. 

In ‘co-ranking’ constructions, all the constituents terminate with verbs of the 
same rank, namely final verbs, or in topic-comment constructions, predicate 
complements. Co-ranking constructions combine clauses, SRCs, and/or 
other co-ranking sentences by juxtaposing or conjoining them. Co-ranking 
constructions supply background information in discourses that primarily 
present events in iconic order. They are also extensively used in more 
thematically oriented discourses, such as encyclopedic descriptions, 
explanations, and hortatory speeches. 
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SRCs and co-ranking structures may be segmented into information chunks 
that are thematically unified. These thematic clause chain units (TCCUs) are 
defined by formal and semantic criteria. They range from one to nine words, 
comprising up to five clauses in an SRC and averaging between one to four 
seconds in length. They are uttered as a basically pause-free unit. 

Cynthia J. M. Farr, SIL, Box 36, Ukarumpa, EHP 444, Papua New Guinea. 

E-mail: jim-cindi_farr@sil.org 



The dynamics of language spread: A study of the 
motivations and the social determinants of the spread of 
Sango in the Republic of Central Africa. 

Degree granted by The Graduate Group in Sociolinguistics, 
University of Pennsylvania 

Mark Edward Karan 
SIL-Central African Group 

Language spread, the process of a language expanding into new geographic 
and language-use areas, has been studied largely through observational 
methods. Thus discussions of the dynamics of the process have been largely 
based only on observation data. The present work employs a memory span 
test to evaluate the competence of a large number of subjects in a spreading 
language, Sango of the Republic of Central Africa. This large-sample, 
quantitative measure of competence enabled statistical studies of the social 
determinants and predictors of competence in the spreading language. The 
results indicate the overriding importance of motivations on the individual 
level in understanding the dynamics of language spread. Based on these 
motivations focused on the individual, a framework for discussion, research, 
and intervention in language spread is presented, along with guidelines for 
more successful intervention in shift situations. Numerous researchers have 
linked language spread and language change (language internal modification 
over time), but without substantive comparisons of the two. This 
quantitative study of language spread provides data on the distribution of 
social factors (age, sex, education, etc.). These distributions are very similar 
to the distributions of social factors in language change, indicating that 
language spread and language change are similar processes. 

Mark Karan, SIL, B.P. 1990, Bangui, Central African Republic. E-mail: Mark_Karan@sil.org 
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Web-Based Language Documentation and Description 
and the Open Language Archives Community 

Workshop 



This workshop (December 12-15, 2000, University of Pennsylvania) 
brought together linguists, archivists, software developers, publishers and 
funding agencies to discuss how best to publish information about language 
on the internet. The workshop and the Open Language Archives Community 
which is developing out of it seem especially important for us in SIL. I was 
pleased to be among those representing SIL, and hope that this report will be 
useful to others in SIL in understanding these new developments in the 
linguistics publishing and archiving field. 

The aim of the workshop was to establish an infrastructure for electronic 
publishing that simultaneously addresses the needs of users (including 
scholars, language communities, and the general public), creators, archivists, 
software developers, and funding agencies. Such an infrastructure would 
ideally meet a number of requirements important to these different 
stakeholders, such as: 

• Provide a single entry point on the internet through which all 
materials can be easily located, regardless of where they are stored 
(on the internet or in a traditional archive). Essentially, this would 
be a massive union catalog of the whole internet and beyond. 

• Identify every language uniquely and precisely, so that all materials 
relevant to a particular language can be located. 

• Make available software for creating, using, and archiving data 
(especially data in special formats); this includes software to help 
convert data from older formats to newer ones. 

• Serve as a forum for giving and receiving advice about software, 
archiving practices, and related matters. 

• Provide opportunity for comments and reviews of materials 
published within the system. 



J. Albert Bickford 
SIL-Mexico Branch 
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The workshop was organized by Steven Bird (University of Pennsylvania) 
and Gary Simons (SIL). 1 It included approximately 40 presentations and 
several working sessions on a variety of topics. 

There was general agreement among the participants that a system for 
organizing the wealth of language-related material on the internet is needed, 
and that an appropriate way to establish one is by following the guidelines of 
the Open Archives Initiative (OAI) http://www.openarchives.org/. (These 
guidelines provide a general framework for creating systems like this for 
specific scholarly communities.) An OAI publishing and archiving system 
contains the following elements: 

• Data providers, which house the materials that are indexed in the 
system. 

• A standardized set of cataloguing information for describing each of 
the materials, also known as ‘metadata’ (i.e., data about data). 

• Service providers, which collect the metadata from all the data 
providers and allow users to search it in various ways so as to locate 
materials of interest to them. 

In the case of linguistics, the system will be known as the Open Language 
Archives Community (OLAC). The Linguist list http://www.linguistlist.org 
has agreed to serve the system as its primary service provider. It will be the 
main source that users will use to find materials through the system. Further 
information about OLAC can be found at http://www.language-archives.org. 
The agreement to establish OLAC is probably the most important 
accomplishment of the workshop. 

This agreement was solidified through working sessions which met during 
the workshop and started the process of working through the details in 
various areas, such as: 

• Character encoding: Unicode, fonts, character sets, etc. 

• Data structure for different types of data (lexicons, annotated text, 
etc.). 

• Metadata (cataloguing information that should be common to the 
whole community and how it should be represented in the 
computer) and other concerns of archivists. 

• Ethics, especially the responsibilities that archivists and publishers 
have to language communities. 



' Funding was provided by the Institute for Research in Cognitive Science (IRCS) of the 
University of Pennsylvania, the International Standards in Language Engineering Spoken 
Language Group (ISLE), and Talkbank. 
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• Expectations of users, creators (e.g. authors), software developers. 

These and other issues will continue to be discussed on email lists in the 
coming months, ultimately culminating in recpmmendations for ‘best 
practice’ in each area, together with a preliminary launch of the whole 
system, hopefully within a year. (Prototypes of the system are available now 
at the gateway address above, along with various planning documents.) 

There were also a number of conference papers which provided a foundation 
for making the working sessions productive. Rather than list or review all 
the presentations here, I will summarize them, since they are all available on 
the conference website 

http://www.ldc.upenn.edu/exploration/expl2000. 

The topics covered included the following: 

• Proposals for various aspects of the OLAC system. 

• Concerns of various stakeholders, such as archivists, sponsors, 
language communities. 

• Descriptions and demonstrations of specific software, research 
projects, and web publishing systems. 

• Metadata and metadata standards. 

• Technical issues, such as Unicode, the OAI, sorting, data formats 
for different types of language materials (e.g. dictionaries, 
annotated text, example sentences in linguistic papers, and audio). 

One insight gleaned from these presentations was a better understanding of 
glossed interlinear text. Interlinear text is not a type of data, but rather just 
one possible way of displaying an annotated text. The annotations on a text 
can consist of many types of information: alternate transcriptions, morpheme 
glosses, word glosses, free translations, syntactic structure (and possibly 
several alternative tree structures for the same text), discourse structure, 
audio and video recordings, footnotes and commentary on various issues, 
etc. What ties them all together is a ‘timeline’ that proceeds from the 
beginning to the end of a text, to which different types of information are 
anchored. Aligned interlinear glosses are one way of displaying some of this 
information, but not the only way, and not even the most appropriate way for 
some types of information. The traditional arrangement of Talmudic 
material, for example, with the core text in the center of the page and 
commentary around the edges, is another possible display of annotated text, 
in which the annotations are associated more with whole sentences and 
paragraphs than with individual morphemes. There are also some 
sophisticated examples available for presenting audio alongside interlinear 
text. See a sample at LACITO archive as follows: 
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http://lacito.archive.vjf.cnrs.fr!) 

Throughout, it was very clear that those at the conference had a great deal in 
common with each other, and there are many points of contact with our 
concerns within SIL: 

• A primary concern for descriptive (as distinguished from 
theoretical) linguistics. 

• A desire to make language materials available to communities of 
speakers and the general public as well as scholars. 

• An interest in taking advantage of the Internet, which provides a 
means of publishing such materials that by-passes the limitations of 
traditional publication (since the costs are so much lower, and thus 
appropriate for materials that have smaller audiences) . 

• Awareness that many materials may be less than fully-polished yet 
still valuable to some people and worth archiving . 

• A sense of frustration with the currently confused state of the art in 
data formats, especially fonts and character encoding, and the lack 
of good information about how best to archive and publish on the 
web. 

• Awareness of the large amount of data that is in data formats which 
will be obsolete in a few years (and thus a willingness to accept data 
in whatever form it is in, while also seeing a need for software to 
help convert data to newer formats). 

• A strong suspicion toward and distrust of rigid requirements, yet a 
willingness to adopt standards voluntarily when their usefulness has 
been demonstrated. 

Given that we have so much in common, it was very appropriate that SIL 

was so actively involved in the workshop and that we should continue our 
involvement. 



• Several SIL members will be active in the continuing discussions 
leading toward inauguration of the OLAC system. 

• Ethnologue codes will almost certainly be used as unique language 
identifiers within the system, and SIL will be recognized as the 
naming authority for assigning them. (Other naming authorities 

will hopefully also be recognized, particularly for areas that the 
Ethnologue does not provide codes for, such as ancient languages 
and language families.) 
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• SIL software is of interest and use to other people, especially 
products like Graphite (for supporting complex non-Roman scripts) 
and Shoebox. In our software development efforts we would do 
well to keep in mind the needs of linguists outside of SIL. Their 
needs are not very different from ours, and sometimes small 
changes in software design can make big differences to eventual 
usability (for us as well as them). 

• Other people are developing specialized linguistic software that 
could be very useful to us or which contains ideas which we can 
incorporate in our own software, especially including tools for 
dictionaries and annotated text. 

• There are many people (linguists, archivists, and software 
developers) who are more than happy to share their expertise with 
us as we work through our own archiving and publication issues. 

• There is funding available to help us develop our own archives, 
provided we are willing to share the contents with others. 

In short, this conference opened up avenues of cooperation with a number of 
important partners in the academic world — a class of potential partners that 
has tended to be ignored in discussions within SIL in recent years. 

Finally, the conference pointed out several trends that will be increasingly 
important in future years, both inside and outside SIL. 

• The speakers of lesser-known languages will be more actively 
involved in the production and use of materials in and about their 
languages, and their concerns will increasingly have to be 
considered by scholars. These include carefully documenting 
permissions and levels of access to materials, making sure that 
language materials are available to the communities themselves, 
and being careful that scholars do not inadvertently aid commercial 
interests in exploiting native knowledge-systems (such as medicinal 
use of plants) without appropriate compensation. 

• The boundary between publishing, libraries, and archiving is being 
blurred by the shift to the digital world. Materials can be ‘archived’ 
on the web, which is a type of publication. Electronic ‘libraries’ are 
springing up in many places. Published and unpublished works 
from around the world can be listed together in one common 
catalog. The same technology is important in both spheres of 
activity. In short, these activities are merging under a new umbrella 
that could be called ‘scholarly information management’. A 
corollary to this trend is that archiving is not just something you 
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do at the end of a language project; it is part of the ongoing process 
of managing the information that the project produces. 

• In such a world, and with huge numbers of resources available to 
sift through, metadata becomes increasingly important. A freeform 
paragraph description in a publications catalog is no longer good 
enough. It is the metadata that users will consult in order to find 
materials of interest to them, so the metadata must be carefully 
structured, accurate, and current. More and more, we will have to 
think not just about producing materials but also about how to 
describe them so as to make them accessible to others. 

• Unicode http://www.unicode.org is the way of the future for 
representation of special characters in computers. The days of 
special fonts for each language project are numbered. Instead, 
Unicode will make possible a single set of fonts that meets virtually 
everyone s needs in the same package. Over the next few years all 
of us will be switching our computers over to using Unicode almost 
exclusively (that is, if we want to take advantage of newer 
software). Our computer support personnel are already actively 
involved in this transition, and have been for several years. It has 
already impacted in various ways on users within SIL, even if we 
haven t realized it, but from now on the impact will be much more 
apparent. Get ready to change your fonts one more time — but once 
you’ve done so, that should be the last. 

• Language data will increasingly need to be structured carefully so 
that not only can people view it and use it, but machines will be 
able to understand and manipulate it in various ways. Many of us 
are familiar with standard format markers, which have been the 
primary means within SIL of marking the structure of data for over 
15 years. Despite its usefulness, it has some limitations. Standard 
format will, within the next few years, be replaced by a more 
comprehensive system called XML which is widely-supported in 
the computer industry and which can do everything that standard 
format can do and more. 2 



XML stands for Extensible markup language’. Since its development has been closely- 
associated with the World Wide Web consortium http://www.w3.org/XML/, it has been widely 
regarded as the successor to HTML for web pages. However, this is just a small part of its 
usefulness; it is a general-purpose system for representing the structure of information in a 
document or database, which can be customized for myriad of purposes. Many software tools 
are currently available for creating and manipulating data in XML, with more being created all 
the time. One, Extensible Style-sheet Language Transformations http://www.w3.org/TR/xslt 
can do complex restructuring of XML data, including many of the things that we have used CC 
for in the past, but with far less programmer time. 
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All in all, it was a workshop that was both stimulating and practical-one 

which will have an unusual amount of influence in months and years to 
come. 

J. Albert Bickford, P. O. Box 8987, Catalina, A Z 85738-0987. "E-mail: albert_bickford@sil.org 



International Linguistic Association, 

46th Annual Conference. Languages of the Americas, 
Native and Non-native. 

New York, NY. March 30, 31, April 1, 2001 

Terry Malone 
SIL — Colombia 

According to their website (http://www.ilaword.org) The International 
Linguistic Association was founded in 1943 in part at the impulse of exiled 
European scholars, and the membership still is international. Although in 
the two decades after its founding the ILA played a significant role in the 
world of American theoretical linguistics, the advent of Chomskyan 
linguistics has done much to overshadow and even eclipse it. This seems a 
shame: most members are dedicated to advancing theoretical linguistics by 
practicing applied linguistics. Few ILA members subscribe to the 
Chomskyan paradigms, but this is perhaps the society for those scholars who 
need a variety of contacts yet can’t afford to belong to half a dozen 
specialized societies all at once. Fellow members will include text linguists 
typologists, Indoeuropean linguists, systemic linguistics, specialists in 
TESOL, university teachers of foreign languages, bilingual educators, 
sociolinguists, and others outside the current American theoretical 
mainstream. All are doggedly practicing linguistics as the society’s founders 
conceived of the discipline; hopefully some day the fashionable tides on 
American linguistic shores will turn, so that these scholars will receive a 
fraction of the appreciation that they deserve. 

The conference theme resulted in about a day’s equivalent of papers and 
plenary sessions on Native American languages, and another day’s 
equivalent of papers on Spanish, Portuguese, and AAVE (African American 
Vernacular English), plus a panel on the latter. There was about a half-day’s 
equivalent of papers on more general linguistic themes. Scholars such as 
Aaron Broadman, Jill Brody, Harriet Klein, Marianne Mithun, David Payne, 
and Rachelle Waksler presented papers or conducted plenary sessions on 
Native American languages, along with less well-known scholars, linguistic 
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‘grunts’ (i.e. OWLS such as myself), and a handful of scholars who do not 
specialize in Native American linguistics but who drew instructive 
typological parallels with more well-studied languages. 

The papers that I was able to hear on Native American languages all had 
immediate relevance to current theoretical issues, both within and outside of 
the Chomskyan mainstream. These included hot topics such as morphemes 
which do not quite fit in any of the categories ‘clitic’, ‘affix’ or ‘word’ 
(‘Floating morphemes in an Amazonian language’ by Sidney Facundes), 
discourse markers which elude simple analysis at any grammatical level 
( Marking Focus in Chatino’ by Troi Carlton and Rachelle Waksler), 
grammatical categories whose membership is not clean-cut (‘The category of 
adjectives in Southern Guaykuruan languages’ by Alejandra Vidal and 
Harriet E. Manelis Klein), gender systems (‘Cross-linguistic view of Gender 
in Ojibwe’ by Donald Steinmetz), and multiple classifier systems in one 
language (‘Classifiers in Chimila’ by yours truly). 

The three plenary sessions were highpoints for me: David Payne discussed 
‘areas of linguistic typology for which South American languages challenge 
current linguistic wisdom’, along with some possible reasons for such 
typological divergence; Ofelia Garcia discussed a hypothesis she is 
exploring concerning what happens when speakers of a minority language 
(Spanish in New York City) try to shift to a majority language (English) but 
do not receive the rewards for language shift that they had anticipated; and 
Marianne Mithun illustrated the crucial need to take diachronic processes 
into account when explaining typological correlations (especially those 
which appear to violate typological universals). 

The small size of the conference (roughly 60 papers were listed in the 
handbook and a few of these cancelled) made it possible for me to interact 
personally with many of the scholars specializing in Native American 
linguistics— that is perhaps what I most appreciated about this conference. 
They all had feedback to offer this linguistic ‘grunt’, and that ardently 
coveted feedback was a major reason I responded to Ruth Brend’s and Mike 
Cahill’s initial call for papers. My biggest disappointment was to see that 
other than David and Judy Payne, I was the only SIL member from the 
Americas to attend this conference. SIL linguistic ‘grunts’ have much to 
offer the members of ILA, and they have much to offer us; both parties 
suffer when such opportunities are ignored. 



O 

ERIC 



Terry Malone, AA 1930 Santa Marta, Magdalena, Colombia, S.A. 

E-mail: terry_malone@sil.org 



Reviews 



Chomsky’s universal grammar, 2nd edition. By Vivian COOK and Mark 
Newsom. Cambridge, MA: Blackwell Publishers, Inc. 1996. 369 pp. 
Hardback $62.95, paper $25.95. 

Reviewed by ALAN BUSEMAN 
SIL—1CTS, Waxhaw 

Noam Chomsky has had a profound impact on modem linguistics. This 
book is an extremely readable introduction to his theories. The first edition 
of this book — published in 1988 — covered the theory known as Government 
and Binding. This second edition is revised to cover more recent versions of 
Chomsky’s theory including Principles and Parameters Theory and the 
Minimalist Programme. 

The authors state their purpose on page. 1 : 

The aim of this book is to convey why Chomsky’s theory of language is 
stimulating and adventurous and why it has important consequences for all those 
working with language ... This book is intended chiefly as an introduction for 
those who want to have a broad overview of the theory with sufficient detail to 
see how its main concepts work, rather than for those who are specialist students 
of syntax . . . 

The authors fulfill their purpose extremely well. The book is easy to read 
and very interesting. The examples are fun. The presentation is nicely 
done — with key concepts summarized in boxes with a gray background. The 
editing is extremely good — with almost no noticeable errors. 

This book is a perfect starting point for a linguist from a different theoretical 
framework who wants to get an overview of Chomsky’s theories. It 
introduces the basic concepts and defines the basic terminology in a very 
thorough and inviting way. The text also includes hundreds of quotes and 
references which will be helpful to anyone who wants to read further on the 
subject. 

Why would you as a field linguist want to read this book? One reason is that 
Chomsky’s concepts are so pervasive in linguistics that you need to 
understand them and know the vocabulary used to express them to help you 
understand much of the other current literature in linguistics. Another reason 
is that since Chomsky’s grammar is universal, it should apply to the 
language you are studying. This may help you start with a more effective set 
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of expectations and can lead to a number of interesting questions about the 
language. You may even find the language you are studying relates in some 
interesting way to the proposed universal grammar, and you may be able to 
publish something about it. 

This is one of my favorite kinds of books and I thoroughly enjoyed it. It 
looks back enough to show the most influential writings and lines of thought 
in the development of the theory, and it is current enough to summarize the 
main current lines of thought. It helped me understand Chomsky’s 
motivations and reasoning. 

If you wish you knew more about what Chomsky has been up to and why his 
influence in linguistics has been so powerful, start with this book. 

Alan Buscman, P. 0. Box 248, Waxhaw, NC 28173. E-mail: Alan_Buseman@sil.org 



Meaning in language: An introduction to semantics and pragmatics. By 

D. Alan CRUSE. New York: Oxford University Press. 2000. 

424 pp. Paper $24.95 

Reviewed by George Huttar 
SIL — Africa Group 

Here is a valuable reference work for your library. Its coverage in semantics, 
especially lexical semantics, is broad and detailed, and it includes a useful’ 
introduction to pragmatics as well. A summary of its organization is 
probably as good a way as any to give you an idea of what it covers, though 
this minimal mentioning of topics falls far short of doing justice to the 
quality of C’s coverage: 

Part 1 ‘Fundamental Notions’ opens, as you might expect, with Ch 1 
‘Introduction’ (3-16). Situating the study of meaning within linguistics and 
other disciplines, C also makes clear what approach to expect throughout the 
book: e.g., ‘Meanings are not finitely describable, so [the] task [of specifying 
or describing meanings] boils down to finding the best way to approximate 
meanings as closely as is necessary for current purposes’ (13). While the 
book takes an ‘ecumenical’ position as far as theories go ... (14): 

... in so far as there is a theoretical bias, it is towards the cognitive semantic 
position. This means, in particular, that the meaning of a linguistic expression is 
taken to arise from the fact that the latter gives access to a particular conceptual 
content. This may be of indeterminate extent: no distinction is made between 

linguistic meaning and encyclopedic knowledge. 
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Ch. 2 ‘Logical matters’ (17-39) covers arguments and predicates; sense, 
denotation, and reference (including intension and extension); sentence, 
statement, utterance, and proposition; logical classes; logical relations’ 
quantification; and use and mention — all those fundamental notions you 
have to keep straight when thinking through seriously what’s going on in 
basic sentence and lexical semantics. 

C’s contextual approach to meaning shows up again in Ch. 3 ‘Types and 
dimensions of meaning’ (41-63), where meaning is characterized as 
‘anything that affects the relative normality of grammatical expressions’ 
(43) — a characterization which leads to a brief but useful treatment of how to 
distinguish grammatical from semantic anomaly. Most of the chapter 
describes dimensions of descriptive meaning — the objective meaning that 
determines truth value and reference — and of non-descriptive meaning. To 
the former belong differences of quality ( red vs. green, dog vs. cat), intensity 
(large vs. huge, scared vs. terrified), specificity (dog vs. animat), vagueness, 
and six other dimensions. The latter covers expressive meaning (Stop 
blubbering vs. Stop crying) and variants according to register (further broken 
down into field, mode, and style) and dialect. This number and variety of 
distinctions is typical of the whole book’s degree of detail. Part 1 concludes 
with Ch. 4, ‘Compositionality’ (65-81), the focus of which is ‘on the way 
meanings combine together to form more complex meanings’. 

Part 2 ‘Words and Their Meanings’ is introduced thus (with page numbers 
added) (83): 

To the layman, words are par excellence the bearers of meaning in language. 
While it is in danger of understating the importance of other linguistic structures 
and phenomena in the elaboration of meaning, this view is not entirely 
unjustified: words do have a central role to play in the coding of meaning, and 
are responsible for much of the richness and subtlety of messages conveyed 
linguistically. Hence it is no accident that this part of the book is the most 
substantial. Here, after the introductory Chapter 5 [85-102], we discuss how 
word meanings vary with context (Chapter 6 [103-124]), the relations between 
word meanings and concepts (Chapter 7 [125-142]), paradigmatic sense 
relations (Chapters 8 [143-161] and 9 [163-176]), larger vocabulary structures 
(Chapter 10 [177-196]), how new meanings grow out of old ones (Chapter 1 1 
[197-216]), how words affect the meanings of their syntagmatic neighbours 
(Chapter 12 [217-235]), and finally, theories of lexical decomposition (Chapter 
13 [237-261]). 

Its introductory chapter usefully summarizes six basic approaches to 
semantics, besides outlining the basic problems of lexical semantics and 
explicating the distinction between lexical and grammatical meaning, the 
relation between word meaning and sentence meaning, and the interesting 
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question of what are, and what are not, possible meanings of words in any 
language (while some languages have a single word (or-stem) for a meaning 
like DRINK slowly — cf. Eng. sip — none is known to have one for woman 
DRANK or THE WINE SLOWLY). A sample of topics from the rest of Part 2 
includes: ambiguity; homonymy and polysemy; "classical and prototype 

approaches to categorization; basic-level categories; hyponymy, meronymy, 
and synonymy; various kinds of opposites; taxonomic and meronymic 
hierarchies; bipolar chains (e.g., minuscule, tiny, small, large, huge, 
gigantic)-, metaphor; metonymy; semantic change; types of combinatory 
abnormality (e.g., collocational clash); componential analysis, and semantic 
primitives. 

Part 3 ‘Semantics and Grammar’ consists of Chapter 24 ‘Grammatical 
semantics’ (265-300); is survey of ‘those aspects of the meanings of larger 
syntactic units which are attributable to grammar’ (263); includes detailed 
description of the grammatical meanings associated with basic lexical 
classes. The discussion of number on nouns, for example, mentions 
singular, plural, dual, trial, and paucal; count nouns and mass nouns; basic 
count nouns used as mass nouns; basic mass nouns used as count nouns; the 
semi-mass use of count nouns; singular nouns with (optional) plural concord; 
plural nouns with (optional) singular concord. The discussion of aspect deals 
not only with distinctions like perfective/imperfective, perfect/prospective 
(the latter has to do with the present relevance of a future event), 
punctual/durative, punctual/iterative, and inchoative/medial/terminative, but 
also with the aspectual character of specific verb stems which affect their 
ability to occur with various aspectual markers, etc. 

Part 4 ‘Pragmatics’ consists of three chapters: Ch. 15 ‘Reference and deixis’ 
(305-327), Ch. 16 ‘Speech acts’ (329-346), and Ch. 17 ‘Implicatures’ (347- 
378). C’s approach here can be inferred from a list of the scholars whose 
work is most frequently referred to: Grice, Searle, Leech, and the relevance 
theorists Sperber, Wilson, and Blakemore. 

An unnumbered chapter, ‘Conclusion’ (379-381) rounds out the book with a 
list of what C sees as the six major areas of uncertainty about fundamental 
issues in meaning in language. 

The usefulness of this book is enhanced by the following features: a 

bibliography of around 150 items; ‘Suggestions for further reading’ at the 
end of each chapter; ‘Discussion questions and exercises’ at the end of 
almost every chapter with answers to most of the questions in a separate 
section; a subject index, and an author index. 
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As a textbook, this is not for your beginning students (as C himself makes 
clear). And it is not always easy for more advanced readers, especially those 
not at home in British English or, in some cases, English culture. For a 
textbook more accessible to students from a variety of national educational 
systems, try Saeed 1997, but get Cruse’s book for a thorough reference work 
on a number of semantic and pragmatic topics, especially in lexical 
semantics. Its combination of breadth and detail offer a useful array of 
stimulating ideas to explore with regard to any language in its own right, as 
well as for purposes of application to effective translating. 

Reference 

Saced, John I. 1997. Semantics. Oxford: Blackwell Publishers Ltd. 

George Huttar, Box 24686, Nairobi, Kenya. E-mail: gcorgc_huttar@sil.org 



Foundations of statistical natural language processing. By Christopher 
D. Manning and Hinrich Schutze. Cambridge, MA: The MIT Press. 

1999. 680 pp. $60. 

Reviewed by MIKE MAXWELL 
SIL — International A dministration 

Statistical approaches to linguistic analysis predate generative linguistics — 
indeed, Noam Chomsky wrote the epitaph of statistical linguistics in the 
1950s. Why, then, is this textbook about statistical natural language 
processing (NLP) coming out now? A complete answer would be too large 
for this review; suffice to say that whatever the scientific reasons for 
preferring generative linguistics (or a competing theory), when it comes to 
implementing practical applications on computers, engineering takes the 
forefront. And for better or worse, engineers do not always imitate nature. 
This, then, is a book about engineering approaches to language, and 
specifically those approaches that generate probabilistic answers. 

While acknowledging that engineering approaches to NLP are indispensable 
in certain contexts, I feel compelled to devote a few paragraphs at the 
beginning of this review to questioning some of Manning and Schutze’s 
more far-reaching claims, as they impinge on the science of linguistics. The 
following quote (taken from the beginning of a chapter on acquiring lexical 
patterns from large corpora) is one such claim (p. 265): 

While we discuss simply the ability of computers to learn lexical information 
from online texts, rather than in any way attempting to model human language 
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acquisition, to the extent that such methods are successful, they tend to 
undermine the classical Chomskyan arguments for an innate language faculty 
based on the perceived poverty of the stimulus. 

There is much that could be said about this. The learning of lexical 
information has seldom been the focus of arguments for an innate language 
faculty, and certainly not Chomsky’s focus; the fact that some 
generalizations may be drawn from large corpora does not negate arguments 
based on the poverty of stimulus, since there are well-known generalizations 
for which one is unlikely to find sufficient supporting data in corpora of any 
size. But rather than argue these points, I will observe that the above 
statement can be turned on its head to the extent that statistical methods by 
themselves do not succeed in learning linguistic generalizations, they 
reinforce the classical arguments for an innate language faculty. Time and 
more research in statistical methods will tell how true this is — but there are 
hints even now. 1 At the end of the same chapter from which the above quote 
was taken, the authors write (p. 311): 

What does the future hold for lexical acquisition? One important trend is to look 
harder for sources of prior knowledge that can constrain the process of lexical 
acquisition. This is in contrast to earlier work that tried to start ‘from scratch’ 
and favored deriving everything from the corpus ... One important source of 
prior knowledge should be linguistic theory, which has been surprisingly 
underutilized in Statistical NLP. 

Having stated most of my negatives up front, I hasten to add that regardless 
of whether statistical NLP has anything to say about the innate vs. learned 
knowledge debate, there are many practical applications where statistical 
NLP shines. In field linguistics, for example, some potential applications 
include determining similarity among related languages/dialects; 
probabilistic disambiguation of parsed text (such as choosing the best 
morphological parse of a word, based on the words in its environment); 
preliminary categorization of collocations and word senses; and part of 



'For those interested in this issue, I will give one other quote, from a discussion on Markov 
chains (p. 378): 

Chomsky s criticism still applies: Markov chains cannot fully model natural language 
What has changed is that approaches that emphasize technical goals such as solving a 
particular task have become acceptable even if they arc not founded on a theory that fully 
explains language as a cognitive phenomenon. 

This emphasis on engineering, not science, is prevalent throughout the text, and quite 
understandable given the goals of NLP. But it makes the authors’ comments quoted in the text 
to the effect that the results of statistical NLP ‘undermine’ arguments for an innate language 
faculty seem rather curious. 
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speech tagging for purposes of syntactic analysis — not to mention potential 
applications to literacy and translation. 2 

One word of caution, though: there is no magic bullet. Statistical NLP tends 
to require large corpora. Just what ‘large’ means: — and how that might vary 
depending on such language-particular factors as amount of inflectional 
morphology — is for the most part glossed over in this book. 3 What the 
reader is occasionally told may prove discouraging to field linguists: the 
English text of Tom Sawyer, with 71,370 word tokens, is ‘a very small 
corpus by any standards, just big enough to illustrate a few basic points’ (p. 
21). It takes a long time to collect 70,000 words of text in a preliterate 
village. 

Foundations of Statistical Natural Language Processing is intended as a 
graduate level text, but is equally suited to the linguist who wants to know 
what all the fuss about statistical NLP is. One application that is not covered 
is that of speech recognition, which has been especially indebted to statistical 
approaches — including it would have made a large book considerably larger. 
There are a number of other topics to which the authors say they have not 
given in-depth coverage, including ‘machine learning, text categorization, 
information retrieval, and cognitive science’ (p. xxxi). Actually, a chapter 
each is devoted to information retrieval and text categorization, and the 
chapters on clustering and text categorization are especially indebted to 
machine learning techniques (in supervised and unsupervised classification). 

Following an introduction to the goals of statistical NLP, there are 
introductory chapters on probability theory, mathematical information 
theory, and linguistics (the latter with an emphasis on structures encountered 
in English). The reader is well advised to have some prior background in 
probability and statistics. My sense as a linguist is that in this book 
explanations of linguistic matters are often simplistic. This is not a 
condemnation; giving more attention to classical linguistics would only have 
made an already formidable volume even larger. 



2 1 emphasize the word ‘preliminary’. Manning and SchUtze write (p. 18): 

It is interesting to look at the types of collocations that a purely linguistic analysis of text 
will discover if plenty of time and person power is available ... a wider variety of 
grammatical patterns is considered [in the manually compiled dictionary entries they 
show] ... Naturally, the quality of collocations is also higher than computer-generated 
lists — as we would expect from a manually produced compilation. 

3 Unfortunatcly, most statistical NLP seems to have been done on English, a language 
notorious for its lack of inflectional morphology. 
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Next follows an eminently practical chapter on the issues of working with 
language corpora, including the problems theoretical linguists gloss over: 
punctuation, upper/lower case, word and sentence division, and text markup. 
(For markup, Manning and Schtitze recommend SGML, but add that XML — 
a slimmed-down version of SGML — is likely to take over for purposes of 
NLP. My sense as I write this review is that this has already happened.) 
Also discussed are several tagging schemes for English; while these are 
referred to in later examples, the tags are too English-specific to be useful for 
other languages. 

The remainder of the book is divided into three sections. The first of these is 
‘Words’, which begins with a chapter concerning methods for discovering 
collocations, nicely illustrated by an example discriminating the meanings of 
the English adjectives ‘strong’ and ‘powerful’ based on with which nouns 
each word co-occurs. This sort of application should be easy to add to a 
concordancing program, although its usefulness will depend on how often 
the words in question appear in one’s texts. (Manning and Schtitze ’s 
example used a corpus of 14 million words, about 1 15 megabytes.) 

Other chapters in this section discuss discovering word senses and lexical 
acquisition in general. A chapter on dealing with sparse data is also 
included, presumably because it can be illustrated by problems dealing with 
words. The authors briefly discuss a technique needed for languages with a 
great deal more inflectional affixation than English, namely stemming 
(removing inflectional affixes). One wonders whether certain purposes 
might not be better served by ‘affixing’, that is, ignoring the stems of 
inflected words. 

The section on ‘Grammar’ includes Markov models which are probabilistic 
finite state automatons. Markov models are often useful for predicting the 
next event based on the last few events; in a linguistic context, ‘events’ 
might be words or parts of speech. For certain kinds of Markov models, 
‘grammars’ can be learned automatically by a computer from a corpus. 
While these are not what a linguist would think of as the grammar of a 
natural language, this may not matter for some applications. One such 
application is further developed in the next chapter on part-of-speech (POS) 
tagging. Since investigators working on POS tagging have worked mostly on 
English, the techniques have emphasized syntagmatic context and lexical 
information, largely ignoring inflectional affixation. This has two impli- 
cations for tagging in other languages. First, POS tagging should work well 
for other languages which largely lack inflection. Secondly, tagging may 
prove useful in more highly inflected languages for disambiguating 
morphologically derived POS information, although to the extent that such a 
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language has ‘free’ word order, syntagmatically-based POS tagging may not 
work well (as Manning and Schiitze observe). 

The grammar section is rounded out by chapters on probabilistic context-free 
grammars (phrase structure grammars with rules annotated for their 
probability) and probabilistic parsing (which involves assigning a probability 
to a particular structural analysis given its context, one application of which 
is disambiguating syntactic parses probabilistically). 

The section entitled ‘Applications and Techniques’ includes a chapter on the 
alignment of bilingual corpora, a technique which has been used to create 
bilingual dictionaries and parallel grammars. Attempts have also been made 
to use the results of such alignment directly for statistically-based machine 
translation, but Manning and Schiitze suggest that these efforts have failed 
because they incorporate too little linguistic knowledge. Alignment is most 
easily done with literal translations of texts such as legal documents; 
Manning and Schiitze state that since ‘religious and literary works’ tend not 
to be translated so literally, they are more difficult to align. 4 Structural 
differences between the languages whose texts are to be aligned naturally 
add to the difficulty, although the authors cite work that has been done to 
overcome this. 

A chapter on clustering compares algorithms for finding similarities among 
objects in general, and words in particular. An example would be grouping 
words into classes which might correspond to parts of speech. The authors 
note that ‘The efficiency of clustering algorithms is becoming more 
important as text collections and NLP data sets increase in size’ (p. 527), 
which implies that these techniques may prove less useful in field linguistics. 

Finally, there are two chapters on information retrieval and text 
categorization, applications of obvious importance in this day of the Internet. 

For the most part, the text seems admirably clear; charts, graphs and 
formulas are well laid-out, and mathematical derivations are straightforward. 
Technical terms are indicated in the margins at their first appearance, which 
together with a passable index makes it easy to look up a term you missed 
the first time through. (This reviewer confesses to having needed that help 
on more than one occasion.) However, while proper names are well 
represented in the index, topics are not as well indexed. For example, there 
is one entry starting with COMPUT (‘computational lexicography’), which is 



4 On the other hand, scripture, as well as some literature such as plays, often comes pre- 
aligncd at roughly the sentence level which should make word-level alignment easier. 
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either one too many (since nearly every section talks about computation), or 
far too few; there is no entry for ‘programming language’ (or anything 

similar), although the topic is discussed on page 1 2 1 . 

Scattered throughout the text are exercises, some of the pencil-and-paper 
variety, some which require the use of publicly available programs, and 
some which require actual programming. 

The authors (and the publisher) are to be commended for setting up a 
companion web site (HYPERLINK http://nlp.stanford.edu/fsnlp) containing 
(or pointing to) the complete table of contents and two sample chapters; 
corpora; executable programs; and— not least— errata. The latter is sizeable,’ 
but given the technical nature of this book, and the large number of 
mathematical formulae, I would hesitate to call it unreasonable. (Some of 
the errata were corrected in the second printing.) 

Mike Maxwell, 7809 Radin Rd„ Waxhaw, NC 28173. E-mail: Mikc_Maxwcll@sil.org 



Demonstratives: Form, function and grammaticalization. By HOLGER 
Diessel. Typological Studies in Language, 42. Amsterdam: John 

Benjamins. 1 999. 

Reviewed by DAVID MEAD 
SIL — Indonesia Branch 

Ask a linguist how much they know about demonstratives and they might 
answer: some languages have ‘person-oriented’ systems (having a special 
term for something near the addressee) while other languages have ‘distance 
oriented’ systems; languages differ in regard to how many distance-terms 
they distinguish; some languages distinguish elevation in their demonstrative 
system, others don’t, etc. 

According to Holger Diessel, factors such as these are important for 
understanding the semantics of demonstratives, but semantics is only one 
aspect. There are, in fact, three other facets of demonstratives which must be 
grasped in order to understand and fully describe these forms, either cross- 
linguistically or in a particular language. 

Following are four different facets. At the outset I found Holger Diessel’s 
book to be insightful and well organized and give it high recommendations, 
with only a few criticisms mentioned below. 

Foundational to Diessel’s approach to demonstratives is his insistence that 
we clearly distinguish the syntactic positions (distributions) in which 



EST COPY AVAILABLE 



1 



70 



Notes on Linguistics 4.1 (2001) 



demonstratives may occur. Minimally there is a need to recognize (a) 
adnominal demonstratives which appear with a co-occurring noun (pop THAT 
balloon); (b) pronominal demonstratives which occur independently in 
argument positions of verbs and adpositions (hit THAT again); (c) adverbial 
demonstratives which are clausal level modifiers (leave it THERE); and (d) 
identificational demonstratives, which occur in copular and non-verbal 
clauses (THAT is my cousin). An important reason to keep these distinctions 
clear is that some languages (such as English) may use the same form in two 
or more of these syntactic positions, while other languages may formally 
distinguish demonstratives in all four positions. Padoe, an Austronesian 
language of Indonesia, is of the latter type. Diessel introduces four 
corresponding category labels, which may appropriately be used of Padoe, 
since this language has four formally distinct categories of demonstratives 
(Padoe data are from my own sources): 



no-langkai 


mia 


he-big 


person 


no-tarima- 'o 


kee 


he-receive-it 


INTERROG 


tila’a 


inehu 


that 


vegetable 


m-powangu 


raha 


PLURAL-build 


house 



la’a 


‘that person is 
big’ 


that 


(demonstrative 

determiner) 


ula’a 


‘did he receive 
that?’ 


that 


(demonstrative 

pronoun) 


henu 


winawa-nggu 


REL 


brought-my 


lehea 


‘(they) built a 
house there’ 


there 


(demonstrative 

adverb) 



‘those are the 
vegetables I 
brought’ 

(demonstrative 

identifier) 



(Two additional categories which languages may formally distinguish are 
manner demonstratives such as Padoe helela'a ‘like that’, and deictic 
presentatives such as French voila .) English on the other hand, notes Diessel 



does not distinguish between demonstrative pronouns and demonstrative 
identifiers. The demonstratives in copular clauses have the same phonological 
and morphological features as pronominal demonstratives in other contexts and 
hence they are considered demonstrative pronouns. 
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Once this basic distinction between distribution and category is grasped, the 
rest of the book makes for easy reading. 

The book itself is organized into seven chapters. Chapter 1, the introduction, 
lays the groundwork for Diessel’s study and introduces the distinction 
between distribution versus category. Chapter 2 is a look in detail at the 
demonstrative systems of four different languages, giving the reader an 
expectation of how widely languages can differ in this area. 

The next four chapters constitute the heart of the book, and each in turn deals 
with one of the four facets important for fully understanding demonstrative 
systems. Chapter 3 outlines and gives examples of the SEMANTIC 
distinctions which may be encoded in demonstrative systems (e.g. distance, 
visibility, elevation, animacy, number, etc.). Chapter 4 considers the 
SYNTAX of demonstratives, wherein Diessel revisits the distribution versus 
category distinction in much greater detail. Chapter 5 looks at the 
PRAGMATIC uses of demonstratives that is, their primary, exophoric use in 
referring to objects present in the speech situation, and various endophoric 
uses such as when demonstratives are used in referring to participants or 
propositions in surrounding discourse. Chapter 6 looks at the 
GRammaticalization of demonstratives. Here again one must p^y 
attention to distribution: only PRONOMINAL demonstratives, for example, are 
likely to become third person pronouns, while only ADNOMINAL 
demonstratives are likely to grammaticize to become definite articles. 
Finally the entire contents of the book are summarized in the concluding 
Chapter 7. ° 



Diessel is to be commended for including data from a wealth of languages. 
Ideas are explained clearly, and the organization of the book is such that — 
once the basic principles are grasped— one could probably read the 
remaining chapters in any order. Field workers at a beginning stage of 
analyzing a language are likely to find chapters 3 and 4 to be the most useful. 
The remaining chapters also have their place, and in fact chapter 6 on 
grammaticalization may hold the key for explaining ‘odd’ behavior of 
certain demonstratives (to wit, they are on their way to becoming some other 
grammatical category). 

I do have a few criticisms of the book. One is that Diessel addresses the 
issue of demonstrative directionals only incidentally. He notes, for example, 
that one semantic category that demonstratives may encode is direction 
(toward speaker, away from speaker, across field of vision of speaker), but 
curiously the English forms hither, thither, hence and thence never come up 

for discussion (later, however, the German directionals hin — and her are 

given as examples of demonstratives which have grammaticalized as 
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preverbs). This left me at a loss as to how to incorporate certain languages 
of Indonesia into his system which— in addition to having all four categories 
of demonstratives given above — have yet another category of demonstratives 
which imply motion in a particular direction, „e.g. Padoe ramai ‘there 
(coming toward here) , apparently distributing as a verb and as a noun 
modifier. 



Regarding his category of demonstrative identifiers, Diessel considers both 
French c‘ (as in c’ est Pascal ‘this is Pascal’) and Ponapean iet (as in iet 
noumw pinselen ‘here is your pencil’) to represent this category. But as the 
English glosses indicate ( c ’ being translated as ‘this’, iet as ‘here’), one 

wonders if this collapsing is really justified. 

In addition, there is surely more to discover regarding the grammatical- 
ization of demonstratives, and Diessel’s chapter 6 cannot yet be the complete 
story. In Padoe alone one may note the grammaticalization of the deictic 
presentative nio (be.here-it) as an existential particle, and the 
grammaticalization of the demonstrative determiner sie ‘that (near)’ as a 
subordinate clause marker ( ro-me-hawe sie... /they-PLURAL-arrive that/ 
upon their arrival ...’). Neither of these possibilities are mentioned by 
Diessel. By the way, his table on page 155, titled The Grammaticalization of 
Demonstratives, lists more possibilities than are actually described in the 
accompanying text. One wonders if this could have resulted from editorial 
changes to the text that were not incorporated into the table. 

The reader should also be aware that Diessel uses the term ‘expletive’ in its 
secondary, less encountered sense which he defines as ‘semantically empty 
pro-forms that some languages require to form certain syntactic 
constructions’ (p. 149) — compare there in there was nothing left. Since 
cognitive linguistics studies have cast doubt on the entire concept of 
semantically empty pro-forms, Diessel’s use of the term is doubly 
regrettable. 

Donald Mead, P. O. Box 81439, 8000 Davao City, Philippines. E-mail: don_mcad@sil.org 



The Cambridge Dictionary of American English. 

By SIDNEY I. Landau, ed. New York: Cambridge University Press. 

2000 . 1,069 pp. Paper $15.95, CD-ROM $ 20 . 95 . 

Reviewed by JOHN E. STARK 
University of florin, Linguistics Dept. 

The renowned fictional detective, Nero Wolfe, once sat by the fireplace in 
his front room burning the third edition of Webster’s New International 



reviews 



73 



admired Wolfe's command of the English language S ' haVe lon * 
know why he felt that wav a f g , an guage, and was curious to 

Wolfe's” assistant aetTf O pages la,er the novel ' Archie Goodwin 

changeably 7 ’ Webster's 3rd aHo' d r, Z USe ' mply “ nd infer inter - 

years g bach y i Jnd a copy of Weblr's’/rd ^ dt f™' A fcw 

stated the definitions chosen were determined by conLon 'usage' "Ivolfe in 

attitude toward language was prescriptive (Landau 2000)). 3510 

Unless his linguistic reading has converted Wolfe to * a ■ 

American English ,s used n w '1,' based ot Tool ' T 
Of written and spoken English taken r J" books 

synonym" ci -6 ' T”*’ (Grey ' 935 ' ' 991)) ' ,n addi,i ™. * “ae^any "^of 
synonym, cross-reference or thesaurus material a. c y yp 01 

person holding a bachelor's degree from an American 'unrSy^^o'ooo 

C^s:r rising,y inade - ate ' bpt - -zzizz 

The work was prepared with consultation from pqt /cct 

target audtence of non-English speakers, people without college eduction" 

and classroom use for high school and lowe? grades Nonetheless „ has’ 
features of mterest to every speaker of English. Nonetheless, it has 

One special feature is a series of boxes imbedded in the text at the location 
of a key word for a functional feature of English, called Language Portraits 
On the page spread for ‘sight— silently’ is a box labeled ‘Silent Letters^ w th 

examples and rules where possible’ for when a b c d e e h k 1 ^ 

and w are silent in written English. Another fe’attire’l found hcIpfUl”^ ft’ 
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inclusion of ‘key words’ behind the entry to indicate the nature of the 
ro 1 lowing definition: e.g. mania [strong interest] / mania [mental illness], 
each followed by an appropriate expansion of the meaning involved. 

This copy came with a software version on CD-'ROM. It contains the full 
body of the dictionary, and an audible pronunciation feature that does a good 
job of producing a standard American English pronunciation of the word in 
isolation. However, I have used other computer dictionaries that had a more 
satisfying user interface (the connection between a person and a computer 
(Landau 2000), without a specific complaint against the Cambridge software 
it just felt cheesy (adj cheap or of low quality) (Landau 2000)). 

This dictionary has elements which would be useful to incorporate into 
mother-tongue dictionaries: the arrangement of head-words and subentries, 
the quality of the illustrative sentences, and the Language Portraits would all 
bear examination as models for other dictionaries. It would also be a very 
helpful tool for second-language English speakers, and I expect that it will 
quickly become the most popular reference dictionary in the Kambari 
Language Project office, where a full-time staff of eight people, from three 
Nigerian languages, work to produce literacy and translation materials. 

That the Cambridge Dictionary example sentence describes Wolfe so aptly is 

an example of the success of a descriptive approach. It receives a ‘should 
buy rating. 
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Direct reference, indexicality, and propositional attitudes. By Wolfgang 
Kunne, Albert Newen, and Martin Anduschus, eds. CSLI Lecture Notes 
No. 70. Stanford, CA. New York: Cambridge University Press. 1997. 402 
pp. Hardback $69.95, paper $24.95. 

Reviewed by Carl Whitehead 
SIL-Papua New Guinea 

Most of this volume consists of revised versions of papers presented at a 
conference of the same name held at Bielefeld, Germany in March 1994. 
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(Two of the 18 papers were submitted later.) The conference brought 
together linguists and philosophers working in the domain of possible world 
semantics, specifically within the framework of David Kaplan’s Demonstra- 
tives, An Essay on the Semantics, Logic, Metaphysics, and Epistemology of 
Demonstratives and Other Indexicals, which was eventually published in 
1989 {Themes from Kaplan edited by J. Almog, J. Perry and H. Wettstein. 
New York: Oxford University Press, pp. 481-563). The articles are by no 
means introductory and one author states explicitly: ‘I have to assume in this 
paper that the reader is acquainted with Kaplan’s semantical framework’ 
(Nida-Rumelin p. 385). Several of the papers make use of semantic formal- 
ism which can be confusing to the uninitiated. The volume is divided into 
Indexicals and names (eight papers), Attitude reports (five papers), and 
Natural kind terms and color terms (three papers). For the sake of readers 
who are not familiar with the term indexicals, in the pure sense they are 
terms which refer directly to the utterance context such as I, here and now. 

I found the volume useful for giving insight into another realm of linguistic 
semantics and for bringing the realisation that even the use of pronouns and 
proper names is semantically far more complex than we normally assume. 
Yet for most SIL translator/linguists, not having the background knowledge 
of possible world semantics and the work of Kaplan specifically, the insights 
gained from the articles would not warrant the time invested in 
understanding them. 

What follows is a summary of the articles in the first section of the book. 
This should not be taken as an indication that this section is more accessible 
or valuable than the others. It is simply that, having skimmed the collection 
and then read this section in sufficient detail to provide the following 
summaries, considerations of time and length of the report dictated this 
limitation. 



In the opening article, ‘Reflexivity, indexicality and names’, John Peny 
seeks to explain the difference in cognitive significance between ‘I am a 
computer scientist’ and ‘David Israel is a computer scientist — both said by 
David Israel. This is accomplished by acknowledging the difference 
between reflexive and incremental content. The former of these refers to 
truth conditions which are about the utterance itself and is present in the first 
sentence to a greater degree than in the second. Indexicals and (proper) 
names have in common that they refer rather than describe (that is, they 
contribute the entire object they designate to the content); they differ in that 
whereas indexicals specify a condition the referent must meet, names 
designate their referent directly. 
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In ‘Tensed thoughts’ James Higginbotham also appeals to reflexivity but to 
differentiate between tensed and tenseless thoughts. Tensed thoughts 
predicate a state which exists at the time of the thought. Beliefs, wishes, 
feelings of regret, relief, or anticipation ‘can only be directed toward 
thoughts that are themselves tensed or else supported by tensed thoughts, 
which locate the time reference of the untensed thoughts with respect to the 
thinker’s present state’ (p 24). 

The following three articles all deal specifically with first-person reference 
(7). Wolfgang Kunne (‘First person propositions: A Fregean account’) asks: 
‘What is the Fregean sense [=mode of presentation] which completes the 
sense of the predication “has blood type A” in [“I have blood type A”]?’ and 
answers it from within the notions proposed by Frege around the turn of the 
century. He proposes a formalization of the ‘ego mode of presentation’ 
which is extendable to other indexicals. While working within Fregean 
philosophy, he does disagree with Frege on some points, including ‘the idea 
that successful communication culminates in content sharing’ (p 65). / is a 
hybrid proper name with each user forming a different TYPE expression 
referring to a proper name, and the speaker will never be able to successfully 
communicate the same content for I that he/she has in mind. Using 
substitute indexical counterparts, however, the addressee can express the 
same thought as the original. 

Christopher Peacocke (‘First-person reference, representational independ- 
ence, and self-knowledge’) differentiates between representational dependent 
uses of I and representational independent uses. In the former (72): 

...the thinker is in a state which represents the content C as correct, and the 
content represented as correct is one which stands in such a justificational 
relation to the content of the belief ‘I am F’. 

Representational independent uses, such as I see the phone is on the table, 
are the result of a transition from the occurrence of a conscious event or state 
to a self ascription. He challenges Kant and others’ assertion that such uses 
of / do not refer but have a transcendental or metaphysical subject. He uses 
the dependent/independent distinction to explain how such sentences do 
have subjects which do refer. 

Albert Newen (‘The logic of indexical thoughts and the metaphysics of the 
‘self’) proposes and defends a new logical formalization for thoughts de se 
(about oneself). He summarizes the attributes which such a formal represen- 
tation would need to meet and demonstrates how his proposal best does so. 
As with Kunne, the proposal includes a special ego mode of presentation, to 
‘represent the immediate way in which I am related to myself while having a 
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thought de se (118). He adopts in part an earlier proposal of Peacocke’s 
distinguishing between a type mode of presentation and a token mode 

■ ? e °j . Un ^ er the new P ro P° sal > the type mode of presentation (MP) is 
indexed by the context to yield a token mode. The formalism can be 
generalized to apply not only to de se thoughts but -also to de re (about other 

'"m f h °x? ht u> I* 16 g u enera,lzed version is <[MP C on.,x,i];object o> which 
would, for Mach s thought I am a shabby pedagogue, would be filled out as: 
( < L e go CO ntexti];Mach>;being a shabby pedagogue). 



Thomas Zimmermann’s ‘The addressing puzzle’ is the only article which 
tocuses on the second person indexicals, arguing that the difference between 
tormal and informal pronouns (primarily German Sie and du) is pragmatic 
rather than semantic (truth-condition-related). On a more general note he 
also argues that ‘No two sentences are uttered in precisely the same 
(possible) situations and hence they cannot have the same truth conditions’ 
The Most Certain Principle is, therefore, replaced with the Least Certain 

Principle: If S and S’ are sentences, then S and S’ differ in truth conditional 
meaning . 



The final two papers in the section reject Frege's proposal of senses or 
modes ot presentation as being unnecessary. Henk Zeevat, in ‘The 
mechanics of the counterpart relation’, presents a model of how meaning 
(objects and ideas) are communicated from one mind to another 
Counterparts can be created within the mind either of real external items or 
via communication, of items internal to the speaker’s mind. Reference to 
non-existent objects can then be explained as reference to items that are only 
internal to the speaker’s mind, whereas Frege claimed that mention of non- 
existent objects was not reference at all. The nature of presupposition and 
assertion is discussed in detail, including the accumulative nature of 
presupposition and the process of finding a referent with the body of 
presupposition resulting in either resolution (finding the referent) or 
accommodation (adding a new item). Ernesto Napoli’s ‘Names, indexicals 
and identity statements’ addresses the nature of proper names and indexicals 
and argues that both are meaningless free variables which only refer in the 
context of an utterance. Frege’s puzzle as to how an identity statement 
containing two proper names (e.g. Tom is Dick) can be informative whereas 
identity statements of the form a=a are analytically true and therefore not 
informative, is answered by challenging those evaluations of informa- 
tiveness^ Since proper names are not constants, ‘an identity statement of the 
rorm a —a can be false and a fortiori not a priori true’ (192). 

Carl Whitehead, P. O. Box 4997, Three Hill, Alberta TOM 2N0 Canada. 

E-mail: whitehead@will.kneehill.com 
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Bilingual acquisition: Theoretical implications of a case study. By 

Margaret Deuchar and Suzanne Quay. 2000. New York: 

Oxford University Press. 1 63 pp. Cloth $70.00. 

Reviewed by Catherine Young 
SIL — Philippines Branch 

Bilingual Acquisition presents the findings of a case study in bilingual 
language acquisition and explores the implications of these findings for 
theories of first and second language acquisition. The research that forms 
the basis for this book began as an ESRC and British academy funded 
project entitled ‘Infant Bilingualism: One System or Two?’ Research grants 
enabled Margaret Deuchar to collect audiovideo data from her developing 
English-Spanish bilingual daughter between the ages of 1:3 and 3:3. The 
primary aims of this book are to.explore the implications for linguistic theory 
of a case study in bilingual acquisition and involves the discussion of a 
number of general theoretical questions. 

• Can phonological distinctions be acquired on acoustic evidence alone? 

• Does lexical development involve an avoidance of synonymy? 

• Can all words in early two-word utterances be assigned to lexical 
categories? 

• How do young children make appropriate language choices? 

Specific implications for bilingual acquisition include the following 
questions: 6 

• Does the bilingual child have one or two linguistic systems from the 
beginning? 

• What criteria should be used in identifying one versus two systems? 

• What are the most important determinants of language choice for the 
developing bilingual? 

Significant description of the methodology involved in eliciting the data for 
the study, a combination of journal and audiovisual records, is included and 
would be very useful for anyone intending to plan similar case studies. 

Bilingual Acquisition contains extensive appendices reflecting the child’s 
cumulative lexicon and multiword utterances. 

Catherine Young; P. O. Box 2270, CPO; 1099 Manila, Philippines. 

E-mail: cathcrine_young@sil.org 
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Coordinator’s Corner 

The value of historical and comparative linguistics 

Historical linguistics got its main impetus in 1786, when Sir William Jones 
wrote: 



The Sanscrit language, whatever be its antiquity, is of a wonderful 
structure; more perfect than the Greek, more copious than the Latin, and more 
exquisitely refined than either, yet bearing to both of them a stronger affinity, 
both in the roots of verbs and in the forms of grammar, than could 
possibly have been produced by accident; so strong indeed, that no philologer 
could examine them all three, without believing them to have sprung from some 
common source, which perhaps no longer exists... 

In ‘examining them all three’, tone would find such similarities as the word 
for ‘two’, which is Greek dud, Latin duo, and Sanskrit dvau. The root for 
‘foot’ is Greek pad-, Latin ped-, and Sanskrit pad- (Hock and Joseph 1996). 
Jones’ claim set off a flurry of research. Eventually the ancestral language 
of Greek, Latin, and Sanskrit (now spelled with a k) was reconstructed, 
labeled Proto-Indo-European, and is still an area of intense historical research. 

Language changes! Below is the first sentence of the ‘Lord’s Prayer’ in Old 
English, around 1 1 00 A.D., along with the King James Version of 1 6 1 1 : 

Faeder ure pu he eart on heofonum, si bin nama gehalgod. 

Our father who art in heaven, hallowed be thy name. 

Syntax, semantics, sounds, orthography — all change. We see many of these 
illustrated in the 500 year span shown above. (The ‘Jf above is a voiceless 
interdental fricative called ‘thorn’, and ‘as’ has its modem IPA value). 

The theme of this issue of Notes on Linguistics is historical and comparative 
linguistics. One might think that these are not SIL’s concern; we are more 
concerned with modern languages. But historically (!) SIL has contributed 
much to these areas, as in the reconstruction of Proto-Otomanguean, at a 
time depth comparable to Proto-Indo-European. David Thomas and others 
have contributed considerably to Mon-Khmer studies. SIL’s on-line 
bibliography reveals 584 entries listed under ‘historical’ and ‘comparative’. 
Why this emphasis? 

Historical linguistics can give you an explanation for an otherwise puzzling 
language pattern. In Konni (Cahill 2000/2001), many nouns have a stem- 
final /g/ which deletes before the singular suffix -rj, as in /kug-q/ -» kur) 
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‘tree (sp.)\ But in other noun stems ending in /g/, the /g/ does not delete, but 
a vowel is inserted, as in /kug-q/ — > kuguq ‘cooking place’. The stem g 
shows up in the plural forms for both below. 

class citation form plural ^ gloss 

a. Noun Class 3 ku-q kug-usi ‘t ree (sp.)’ 

b. Noun Class 1 kug-uq kug-e ‘cooking place’ 

The environments for the /g/ in the citation forms of these two words are 
virtually identical, differing only in noun class. Whether the rule applies or 
not seems to be arbitrarily a function of what noun class the word belongs to. 
The pattern here can be traced back to the historical sources of these nouns. 
Comparing Konni to its nearest relative Buli, we find that Noun Class 3 
words of ‘Proto-Buli-Konni’ did not have a g in the root, but a k. This k was 
originally a suffix indicating Class 3, but has been re-interpreted as part of 
the root. So the protoform *kuk became kug, and the protoform *kuk-si 
became kug-usi. But in Noun Class 1, the protoroots did have g. The form 
kug-i became kug-ug, and *kug-a became kug-e. We can make sense out 
of this apparently arbitrary pattern by a historical approach. 

Sometimes historical linguistics can even solve serious sociolinguistics 
problems. David Thomas tells of the Chrau orthography in Vietnam which 
first followed what seemed to be a fairly central dialect but some people 
from another very large area objected. Eventually, all agreed to follow the 
early Chrau’ forms which the Thomases had reconstructed (and which were 
fairly close to what they had been using), and this satisfied everyone. 

In this issue of Notes on Linguistics, Joe Grimes gives an overview of the 
value of historical linguistics in fieldwork, and cites a number of SIL- 
authored historical publications. Paulette Hopple’s paper summarizes the 
contribution, of comparative studies in Southeast Asia. Bob Longacre 
reviews SIL s contribution to Proto-Otomanguean studies, and looks at some 
current issues. I think you will find these to be an interesting set of articles. 

References 
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African Languages XVIII. 1 : 49-69. 
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The Value of Comparative Linguistics 

Joseph E. Grimes 

Cornell University and Summer Institute of Linguistics 

Not long ago, in a country not all that far away, the sixth in a series of 
language survey teams was sent into an area. The first five had either looked 
at only part of the area, or had not been able to form conclusions on the basis 
of the limited kinds of data they had been directed to collect. This was not 
surprising; other linguists who had looked over the area had found their own 
data problematic too. 

Fortunately, the surveyors for the sixth round went beyond what they were 
instructed to do. They looked at some of the sound changes, which showed a 
regular pattern. For example, two of the varieties examined show a 
neutralization of voiced obstruents to voiceless in the final syllable of noun 
stems and the final consonant of verb stems. Three other varieties retain 
root-medial /k/ between vowels, while the two first mentioned along with 
four others drop the /k/ and coalesce-like vowels that come into juxtaposition 
as a result of the loss into a single vowel with a falling tone. They also 
looked at noun classifier regularities and forms of the negative. That type of 
structurally based observation could have been a sound basis for a definitive 
interpretation of the survey data had they taken it further. 

Systematic comparisons such as the ones they did, based on proven 
principles, accomplish one of two things: either they give a highly reliable 
picture of language relationships that is not scrambled by random factors like 
linguistic borrowing, or they confirm that language relationships in an area 
are indeed crisscrossing and snarled — but then they pinpoint what the areas 
of uncertainty are. 

What seems to be forgotten when survey teams encounter such complex 
situations is that comparative linguistics is the most highly developed means 
we linguists have for getting our bearings in the many-sided world of 
language varieties. Typology and the theory of universals also orient us to 
thinking across languages, but in other ways; it is comparative linguistics 
that gives us our best handle on language diversity. 

The heart of comparative linguistics is the identification of systematic, 
patterned similarities and differences across speech varieties — phonology, 
morphophonemics, morphology, syntax, and semantics . 1 



1 Fascinating examples of how all these intertwine can be found in Buck 1949. 
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These regularities fit a model in which changes in speech behavior propagate 
over time to some but not all of the speakers of a speech variety. Who is or is 
not affected by the changes depends on factors like geographic proximity and 
social group identification, so that migrations and political fallings out or 
alliances play a role in the process, even though what results is best described 
by linguistics. 

Changes appear all the time. Some may be accepted widely in a short time; 
others creep slowly through social space. Others are ignored, or have only 
local influence. 

The net impact of all changes over time is that some speech varieties are very 
similar to each other, and others less similar. Still others do not show any 
patterned similarities at all, evidence either that they come from different 
streams of development or else that they parted company so long ago that the 
patterns have faded beyond our reach. 

The relationships implied by recognizable regularities yield something close to 
the kind of family tree seen in biological studies . 2 The tree metaphor is often 
looked down on because it requires considerable commentary about anomalies 
in relationships that are not quite tree-like . 3 Still, trees are a reasonable 
idealization for most of what happens in linguistic change. Clusters in 
multidimensional space would be a better idealization, but linguists in general 
are not yet comfortable with that mode of description . 4 

We all wish we could reduce the complexity of factors that result in relative 
closeness or distance among varieties to simple numbers, as if linguistic 
relationships were a linear measure like distance on a map . 5 Comparative 

2 The family trees drawn by genealogists and animal breeders trace the ancestry of particular 
individuals through sexual unions that produce offspring. Those used in biological sciences 
such as virology trace populations (such as new strains of the HIV virus), not individuals. 
Language family trees are of the latter variety, showing junctures where speech communities 
diverge, not the speech behavior of individuals. 

A good example is Chafe and Foster’s observation (1981) on the development of the 
Iroquoian languages, that Tuscarora and Cayuga diverged from the rest in the earliest 
recoverable split, but later, still carrying the effects ofthat change, Cayuga joined back with the 
rest in behaving like languages such as Seneca with respect to some much later changes. 

4 One possible line of thought is opened up in Grimes and Agard 1959 (the beginning of 
what Howard McKaughan later dubbed “phonostatistics”), footnote 12, where the results of 
comparative phonology are presented as clusterings of multidimensional vectors. Various 
means of interpreting such data are discussed in Gauch 1982 and more recent software packages 
for the display of numerical data. 

5 A linear measure is one like measures of length, where a centimeter on one part of the scale 
is the same as a centimeter on any other part of the scale. Much of the universe is nonlinear; 

$2 



O n 



